N-Tuple Output¶
CP Algorithms come with their own output mechanism that provides n-tuples based on the configuration provided. There are reasonable defaults for most blocks, but also many customization options. This includes a number of options for MC-Only and DSID-specific outputs.
The output system does take advantage of the systematics handling system to write out the minimal set of systematics for each variable. For MET it will also by default just write out the Final term, instead of all terms.
The system also guarantees that you get the same n-tuple output format (a.k.a. schema) for each configuration, even if no events were processed. This is different from a lot of output systems in ATLAS and can sometimes require some extra work, but it brings major advantages for processing the n-tuples in RDataFrame and other analysis frameworks.
For a practical introduction to using n-tuple output, see the tutorial.
User Configuration Level¶
The Output section in the YAML configuration file controls what goes
into the output n-tuple. Configuration blocks automatically register
variables they create, so the Output section primarily specifies which
containers to write and optionally controls which variables are enabled.
Basic Structure¶
The minimal Output section specifies the tree name and the containers to write:
Output:
treeName: analysis
containers:
mu_: AnaMuons # Muon variables prefixed with "mu_"
el_: AnaElectrons # Electron variables prefixed with "el_"
jet_: AnaJets # Jet variables prefixed with "jet_"
met_: AnaMET # MET variables prefixed with "met_"
"": EventInfo # EventInfo variables with no prefix
The treeName option specifies the name of the output TTree (default:
analysis). The containers dictionary maps output prefixes to
container names. The prefix is prepended to all branch names from that
container. Use an empty string "" for no prefix.
Automatic Variable Registration¶
Configuration blocks automatically register output variables as they are configured. For example, when you configure muons with a working point:
Muons:
- containerName: AnaMuons
WorkingPoint:
- selectionName: medium
quality: Medium
isolation: Loose_VarRad
The muon configuration blocks automatically register kinematic variables (pt, eta, phi, m), efficiency scale factors, and other relevant variables. These will be written to the output without requiring explicit specification.
Enabling and Disabling Variables¶
The commands option provides regex-based control over which variables are written:
Output:
treeName: analysis
commands:
- disable actualInteractionsPerCrossing # Disable specific variable
- enable someDisabledVariable # Enable disabled variable
- rename mu_(.*) muon_\1 # Rename branches matching pattern
- optional enable jet_ghostTrack.* # Optional pattern (no error if not found)
containers:
mu_: AnaMuons
el_: AnaElectrons
Commands are processed in order and support three operations:
enable pattern- Enable all variables matching the regex patterndisable pattern- Disable all variables matching the regex patternrename pattern replacement- Rename variables matching pattern using regex substitution
By default, if a pattern doesn't match any variables, an error is
raised. Prefix the command with optional to suppress this error for
non-critical patterns:
commands:
- optional enable jet_ghostTrack.* # Won't error if no ghostTrack variables exist
Explicit Variable Specification¶
When you need variables that aren't automatically registered by
configuration blocks, use vars to specify them explicitly:
Output:
vars:
- EventInfo.actualInteractionsPerCrossing -> actualMuScaled
- AnaMuons_NOSYS.muonType -> mu_muonType type=uint16
- MissingET.px -> met_px metTerm=Final
containers:
mu_: AnaMuons
"": EventInfo
The format is Container.variableName -> outputName, optionally
followed by type=typename if automatic type detection fails. The
container name may include _NOSYS or _%SYS% suffixes to handle
variables affected by systematics. For met variables one can specify a
MET term.
There are also older options for MET variables specifically (metVars
and truthMetVars) that automatically pick up the chosen MET term (see
MET Handling below), but they should be less relevant given that you can
by now directly specify the term in the variable string:
Output:
metVars:
- AnaMET_%SYS%.met -> met_%SYS%
- AnaMET_%SYS%.phi -> met_phi_%SYS%
truthMetVars:
- TruthMET_NOSYS.met -> truth_met
MC-Only Content¶
To avoid crashes when processing data, declare MC-only content separately:
Output:
containers:
mu_: AnaMuons
el_: AnaElectrons
containersOnlyForMC:
truth_mu_: TruthMuons
truth_el_: TruthElectrons
varsOnlyForMC:
- EventInfo.truthEventWeight -> truthWeight
The containersOnlyForMC and varsOnlyForMC options work identically
to their non-MC counterparts but are only active when processing Monte
Carlo. This allows a single configuration file to work for both data and
MC.
DSID-Based Filtering¶
For more fine-grained control, restrict containers to specific dataset IDs:
Output:
containers:
mu_: AnaMuons
containersOnlyForDSIDs:
signal_jet_:
- 410470 # Specific DSID
- "50.*" # Regex pattern matching multiple DSIDs
This is particularly useful when different samples require different output containers, such as writing signal-specific containers only for signal samples.
You can also apply commands conditionally based on DSID:
Output:
commandsOnlyForDSIDs:
410470: # Only for this DSID
- enable signal_specific_variable
MET Handling¶
By default, MET containers extract only a single term rather than writing the full container:
Output:
metTermName: Final # Default MET term
truthMetTermName: NonInt # Default truth MET term
containers:
met_: AnaMET
The metTermName option (default: "Final") specifies which MET term
to extract from the MET container. The truthMetTermName option
(default: "NonInt") does the same for truth MET.
To write a MET container with all terms (for special studies), use containersFullMET:
Output:
containers:
met_: AnaMET # Just the Final term
containersFullMET:
met_all_: AnaMET # All MET terms
Note that containersFullMET requires a different prefix than
containers to avoid conflicts. The reason to keep the final term MET
and add a completely separate variable with all MET terms is that this
means your existing analysis scripts will still all run as is, and you
can simply access the full terms where needed.
Selection Flag Aggregation¶
The output system can create aggregated selection flag branches that combine all cuts for a given working point:
Output:
storeSelectionFlags: True # Default
selectionFlagPrefix: select # Default prefix
skipRedundantSelectionFlags: True # Default
When storeSelectionFlags is enabled (the default), a single boolean
flag is created for each working point that combines all selection cuts.
For example, a muon with a "medium" working point would get a
select_medium branch.
The skipRedundantSelectionFlags option (default: True) prevents
writing selection flags for selections that always pass, reducing
unnecessary output.
Advanced Options¶
alwaysAddNosys: If set to True, all branches get a systematics
suffix, even those unaffected by systematics. This can be useful for
frameworks that expect all branches to have systematic naming:
Output:
alwaysAddNosys: True
# Now even eta/phi get "_NOSYS" suffix
streamName: The name of the output stream (default: ANALYSIS).
This is primarily relevant when integrating with larger Athena jobs:
Output:
streamName: ANALYSIS # Default
nonContainers: Explicitly declare which container names should be
treated as non-containers (scalars rather than vectors). EventInfo is
included by default:
Output:
nonContainers: ['EventInfo', 'MyCustomScalar']
Complete Example¶
Here's a comprehensive example showing many output options:
Output:
treeName: analysis
# Basic containers
containers:
mu_: AnaMuons
el_: AnaElectrons
jet_: AnaJets
met_: AnaMET
"": EventInfo
# MC-only containers
containersOnlyForMC:
truth_mu_: TruthMuons
truth_jet_: AntiKt4TruthJets
# DSID-specific containers
containersOnlyForDSIDs:
signal_:
- "410.*" # All 410xxx samples
# Explicit variables
vars:
- EventInfo.runNumber -> runNumber
- EventInfo.eventNumber -> eventNumber
varsOnlyForMC:
- EventInfo.mcChannelNumber -> mcChannelNumber
# MET variables
metVars:
- AnaMET_%SYS%.sumet -> met_sumet_%SYS%
# Control which variables are written
commands:
- disable .*averageInteractionsPerCrossing
- enable jet_timing
- rename mu_(.*) muon_\1
# MET configuration
metTermName: Final
truthMetTermName: NonInt
# Selection flags
storeSelectionFlags: True
skipRedundantSelectionFlags: True
Thinning¶
The Thinning block is necessary to prepare containers for output by
creating view containers that contain only the selected objects. Without
thinning, all objects from the input containers would be written to the
output regardless of any selections applied.
Thinning:
- containerName: AnaMuons
selectionName: medium
- containerName: AnaElectrons
selectionName: loose
- containerName: AnaJets
selectionName: passJvt
Options:
containerName: The input container to thinselectionName: The selection to use for thinning (objects passing this selection are kept)outputName: An optional name for the output (thinned) container. If not specified, the input container name is used (creating a new copy under that name).selection: An optional explicit selection decoration to use (alternative toselectionName)deepCopy: IfTrue, creates a deep copy of objects (default:False)sortPt: IfTrue, sorts output objects by pt (not supported with systematics)
The thinning process creates a view container that includes only objects passing the specified selection. This is critical for reducing output size and ensuring that only relevant objects are written to the n-tuple. Please note that if an object passes for just a single systematic it will be accepted for all systematics. This is necessary so that the individual objects in the output vectors line up for all branches.
Configuration Block Level¶
The output system automatically collects variables from configuration blocks as they are configured, eliminating the need for users to manually specify long lists of output branches. Each configuration block registers the output variables it creates, and these are automatically written to the output tree based on the containers specified in the Output section.
When writing configuration blocks that produce output variables, you use
the addOutputVar() method to register variables with the output
system. This method is called during block configuration and tells the
Output block which variables should be available for writing. The method
signature is:
config.addOutputVar (containerName, variableName, outputName,
*, noSys=False, enabled=True, auxType=None)
The parameters are:
containerName: The container the variable belongs to.variableName: The name of the decoration on the container. For variables that have systematic variations, include%SYS%in the name (e.g.'effSF_%SYS%'). Unless those systematic variations are handled via shallow copies (e.g.'pt').outputName: The name used in the output ntuple/file.noSys: IfTrue, the variable has no systematic variations and will only be written once (not per systematic). Use this for quantities likeetaorphithat don't change with systematics.enabled: IfFalse, the variable is registered but not written. The user can then enable that variable from the configuration without having to specify any extra information.auxType: Override the type for the output variable. Common values are'float','int','char'. This is occasionally needed when the automatic type detection doesn't work correctly (e.g. copying variables from the input file that don't have an accessor defined in the algorithm).
Basic Usage¶
# Kinematic variables - pt varies with systematics, others don't
config.addOutputVar (self.containerName, 'pt', 'pt')
config.addOutputVar (self.containerName, 'eta', 'eta', noSys=True)
config.addOutputVar (self.containerName, 'phi', 'phi', noSys=True)
config.addOutputVar (self.containerName, 'charge', 'charge', noSys=True)
Conditional Variables¶
When a variable is only produced by an algorithm that runs
conditionally, the addOutputVar call should be inside the same
conditional:
if self.decorateExtraVariables:
alg = config.createAlgorithm( 'CP::TauExtraVariablesAlg',
'TauExtraVariablesAlg' )
alg.taus = config.readName (self.containerName)
config.addOutputVar (self.containerName, 'nTracksCharged', 'nTracksCharged', noSys=True)
This ensures that the output variable is defined whenever the algorithm is run.
Type Override¶
Use auxType when the automatic type detection doesn't produce the
desired result:
config.addOutputVar (self.containerName, 'NNDecayMode', 'NNDecayMode', noSys=True, auxType='int')
config.addOutputVar (self.containerName, 'passTATTauMuonOLR', 'passTATTauMuonOLR', noSys=True, auxType='char')
Disabled Variables¶
Sometimes you have variables that seem useful, but only to certain users. In that case you can add them as disabled variables, meaning they won't be written out by default:
# Efficiency value (disabled by default to reduce output size)
alg.efficiencyDecorationName = f'eff_{workingPoint}_%SYS%'
config.addOutputVar(containerName, alg.efficiencyDecorationName, f'eff_{workingPoint}',
enabled=False)
Users can then enable these with commands if needed:
Output:
commands:
- enable .*specialVariable
The advantages of adding a disabled variable versus having the user add them by hand are that it is typically easier for the user to just enable the variable, and that there is a greater consistency between n-tuples of different users.
Explicit Type Specification¶
In some cases the automatic type detection doesn't work, in which case you will have to specify it manually:
# Force specific type
config.addOutputVar(containerName, 'muonType', 'muonType',
noSys=True, auxType='uint16')
Integration with Output System¶
The addOutputVar() method stores the variable information in the
ConfigAccumulator. When the Output block runs, it retrieves all
registered variables for each container specified in the containers
option. The Output block then:
- Retrieves all registered output variables for each container
- Applies any
commandsto enable/disable/rename variables - Generates branch declarations in the format:
ContainerName_%SYS%.variableName -> branchName_%SYS% - Handles
noSysvariables by replacing%SYS%withNOSYSin the branch declaration - Creates the appropriate output algorithms to write the variables
Notes from the Developers¶
While CP Algorithms did always come with some output facilities, the original intent was to provide an algorithm sequence to populate the event store. However, we decided to make n-tuple output the primary target because we believe that for most users that's the output they want (based on what other frameworks were doing) and there are quite a few subtleties in implementing them. However, populating the event store is still a supported workflow.
This differs from the typical pattern in Athena configuration in that it collects the list of output branches as it goes through the individual configuration blocks (as opposed to a long list in the output configuration). Besides avoiding the long list in a single place, it also makes it a lot easier to adjust the list based on what algorithms get actually configured and what is available.
The reason we have the mechanism to collect aux-types during configuration and initialization is that this allows us to create exactly the same output n-tuple format, even if not a single event is processed. This can make subsequent processing more straightforward, particularly for RDataFrame users. It is also assumed that in many cases those aux-types will not have to be manually specified, e.g. systematics handles and columnar accessors will already declare the aux-types.
There is also a mechanism for providing xAOD outputs, but it is not clear if that is used by anyone, or if it is still working or providing sufficient functionality to be useful.
Ideally we wouldn't need the Thinning block in the user configuration.
When we originally set up the configuration that was the easiest thing
to implement, but for practical purposes it would be much nicer if the
output block automatically took care of the thinning.