Skip to content

N-Tuple Output

CP Algorithms come with their own output mechanism that provides n-tuples based on the configuration provided. There are reasonable defaults for most blocks, but also many customization options. This includes a number of options for MC-Only and DSID-specific outputs.

The output system does take advantage of the systematics handling system to write out the minimal set of systematics for each variable. For MET it will also by default just write out the Final term, instead of all terms.

The system also guarantees that you get the same n-tuple output format (a.k.a. schema) for each configuration, even if no events were processed. This is different from a lot of output systems in ATLAS and can sometimes require some extra work, but it brings major advantages for processing the n-tuples in RDataFrame and other analysis frameworks.

For a practical introduction to using n-tuple output, see the tutorial.

User Configuration Level

The Output section in the YAML configuration file controls what goes into the output n-tuple. Configuration blocks automatically register variables they create, so the Output section primarily specifies which containers to write and optionally controls which variables are enabled.

Basic Structure

The minimal Output section specifies the tree name and the containers to write:

Output:
  treeName: analysis
  containers:
    mu_: AnaMuons      # Muon variables prefixed with "mu_"
    el_: AnaElectrons  # Electron variables prefixed with "el_"
    jet_: AnaJets      # Jet variables prefixed with "jet_"
    met_: AnaMET       # MET variables prefixed with "met_"
    "": EventInfo      # EventInfo variables with no prefix

The treeName option specifies the name of the output TTree (default: analysis). The containers dictionary maps output prefixes to container names. The prefix is prepended to all branch names from that container. Use an empty string "" for no prefix.

Automatic Variable Registration

Configuration blocks automatically register output variables as they are configured. For example, when you configure muons with a working point:

Muons:
  - containerName: AnaMuons
    WorkingPoint:
      - selectionName: medium
        quality: Medium
        isolation: Loose_VarRad

The muon configuration blocks automatically register kinematic variables (pt, eta, phi, m), efficiency scale factors, and other relevant variables. These will be written to the output without requiring explicit specification.

Enabling and Disabling Variables

The commands option provides regex-based control over which variables are written:

Output:
  treeName: analysis
  commands:
    - disable actualInteractionsPerCrossing      # Disable specific variable
    - enable someDisabledVariable                # Enable disabled variable
    - rename mu_(.*) muon_\1                     # Rename branches matching pattern
    - optional enable jet_ghostTrack.*           # Optional pattern (no error if not found)
  containers:
    mu_: AnaMuons
    el_: AnaElectrons

Commands are processed in order and support three operations:

  • enable pattern - Enable all variables matching the regex pattern
  • disable pattern - Disable all variables matching the regex pattern
  • rename pattern replacement - Rename variables matching pattern using regex substitution

By default, if a pattern doesn't match any variables, an error is raised. Prefix the command with optional to suppress this error for non-critical patterns:

commands:
  - optional enable jet_ghostTrack.*  # Won't error if no ghostTrack variables exist

Explicit Variable Specification

When you need variables that aren't automatically registered by configuration blocks, use vars to specify them explicitly:

Output:
  vars:
    - EventInfo.actualInteractionsPerCrossing -> actualMuScaled
    - AnaMuons_NOSYS.muonType -> mu_muonType type=uint16
    - MissingET.px -> met_px metTerm=Final
  containers:
    mu_: AnaMuons
    "": EventInfo

The format is Container.variableName -> outputName, optionally followed by type=typename if automatic type detection fails. The container name may include _NOSYS or _%SYS% suffixes to handle variables affected by systematics. For met variables one can specify a MET term.

There are also older options for MET variables specifically (metVars and truthMetVars) that automatically pick up the chosen MET term (see MET Handling below), but they should be less relevant given that you can by now directly specify the term in the variable string:

Output:
  metVars:
    - AnaMET_%SYS%.met -> met_%SYS%
    - AnaMET_%SYS%.phi -> met_phi_%SYS%
  truthMetVars:
    - TruthMET_NOSYS.met -> truth_met

MC-Only Content

To avoid crashes when processing data, declare MC-only content separately:

Output:
  containers:
    mu_: AnaMuons
    el_: AnaElectrons
  containersOnlyForMC:
    truth_mu_: TruthMuons
    truth_el_: TruthElectrons
  varsOnlyForMC:
    - EventInfo.truthEventWeight -> truthWeight

The containersOnlyForMC and varsOnlyForMC options work identically to their non-MC counterparts but are only active when processing Monte Carlo. This allows a single configuration file to work for both data and MC.

DSID-Based Filtering

For more fine-grained control, restrict containers to specific dataset IDs:

Output:
  containers:
    mu_: AnaMuons
  containersOnlyForDSIDs:
    signal_jet_:
      - 410470  # Specific DSID
      - "50.*"  # Regex pattern matching multiple DSIDs

This is particularly useful when different samples require different output containers, such as writing signal-specific containers only for signal samples.

You can also apply commands conditionally based on DSID:

Output:
  commandsOnlyForDSIDs:
    410470:  # Only for this DSID
      - enable signal_specific_variable

MET Handling

By default, MET containers extract only a single term rather than writing the full container:

Output:
  metTermName: Final          # Default MET term
  truthMetTermName: NonInt    # Default truth MET term
  containers:
    met_: AnaMET

The metTermName option (default: "Final") specifies which MET term to extract from the MET container. The truthMetTermName option (default: "NonInt") does the same for truth MET.

To write a MET container with all terms (for special studies), use containersFullMET:

Output:
  containers:
    met_: AnaMET              # Just the Final term
  containersFullMET:
    met_all_: AnaMET          # All MET terms

Note that containersFullMET requires a different prefix than containers to avoid conflicts. The reason to keep the final term MET and add a completely separate variable with all MET terms is that this means your existing analysis scripts will still all run as is, and you can simply access the full terms where needed.

Selection Flag Aggregation

The output system can create aggregated selection flag branches that combine all cuts for a given working point:

Output:
  storeSelectionFlags: True              # Default
  selectionFlagPrefix: select            # Default prefix
  skipRedundantSelectionFlags: True      # Default

When storeSelectionFlags is enabled (the default), a single boolean flag is created for each working point that combines all selection cuts. For example, a muon with a "medium" working point would get a select_medium branch.

The skipRedundantSelectionFlags option (default: True) prevents writing selection flags for selections that always pass, reducing unnecessary output.

Advanced Options

alwaysAddNosys: If set to True, all branches get a systematics suffix, even those unaffected by systematics. This can be useful for frameworks that expect all branches to have systematic naming:

Output:
  alwaysAddNosys: True
  # Now even eta/phi get "_NOSYS" suffix

streamName: The name of the output stream (default: ANALYSIS). This is primarily relevant when integrating with larger Athena jobs:

Output:
  streamName: ANALYSIS  # Default

nonContainers: Explicitly declare which container names should be treated as non-containers (scalars rather than vectors). EventInfo is included by default:

Output:
  nonContainers: ['EventInfo', 'MyCustomScalar']

Complete Example

Here's a comprehensive example showing many output options:

Output:
  treeName: analysis

  # Basic containers
  containers:
    mu_: AnaMuons
    el_: AnaElectrons
    jet_: AnaJets
    met_: AnaMET
    "": EventInfo

  # MC-only containers
  containersOnlyForMC:
    truth_mu_: TruthMuons
    truth_jet_: AntiKt4TruthJets

  # DSID-specific containers
  containersOnlyForDSIDs:
    signal_:
      - "410.*"  # All 410xxx samples

  # Explicit variables
  vars:
    - EventInfo.runNumber -> runNumber
    - EventInfo.eventNumber -> eventNumber

  varsOnlyForMC:
    - EventInfo.mcChannelNumber -> mcChannelNumber

  # MET variables
  metVars:
    - AnaMET_%SYS%.sumet -> met_sumet_%SYS%

  # Control which variables are written
  commands:
    - disable .*averageInteractionsPerCrossing
    - enable jet_timing
    - rename mu_(.*) muon_\1

  # MET configuration
  metTermName: Final
  truthMetTermName: NonInt

  # Selection flags
  storeSelectionFlags: True
  skipRedundantSelectionFlags: True

Thinning

The Thinning block is necessary to prepare containers for output by creating view containers that contain only the selected objects. Without thinning, all objects from the input containers would be written to the output regardless of any selections applied.

Thinning:
  - containerName: AnaMuons
    selectionName: medium

  - containerName: AnaElectrons
    selectionName: loose

  - containerName: AnaJets
    selectionName: passJvt

Options:

  • containerName: The input container to thin
  • selectionName: The selection to use for thinning (objects passing this selection are kept)
  • outputName: An optional name for the output (thinned) container. If not specified, the input container name is used (creating a new copy under that name).
  • selection: An optional explicit selection decoration to use (alternative to selectionName)
  • deepCopy: If True, creates a deep copy of objects (default: False)
  • sortPt: If True, sorts output objects by pt (not supported with systematics)

The thinning process creates a view container that includes only objects passing the specified selection. This is critical for reducing output size and ensuring that only relevant objects are written to the n-tuple. Please note that if an object passes for just a single systematic it will be accepted for all systematics. This is necessary so that the individual objects in the output vectors line up for all branches.

Configuration Block Level

The output system automatically collects variables from configuration blocks as they are configured, eliminating the need for users to manually specify long lists of output branches. Each configuration block registers the output variables it creates, and these are automatically written to the output tree based on the containers specified in the Output section.

When writing configuration blocks that produce output variables, you use the addOutputVar() method to register variables with the output system. This method is called during block configuration and tells the Output block which variables should be available for writing. The method signature is:

config.addOutputVar (containerName, variableName, outputName,
                     *, noSys=False, enabled=True, auxType=None)

The parameters are:

  • containerName: The container the variable belongs to.
  • variableName: The name of the decoration on the container. For variables that have systematic variations, include %SYS% in the name (e.g. 'effSF_%SYS%'). Unless those systematic variations are handled via shallow copies (e.g. 'pt').
  • outputName: The name used in the output ntuple/file.
  • noSys: If True, the variable has no systematic variations and will only be written once (not per systematic). Use this for quantities like eta or phi that don't change with systematics.
  • enabled: If False, the variable is registered but not written. The user can then enable that variable from the configuration without having to specify any extra information.
  • auxType: Override the type for the output variable. Common values are 'float', 'int', 'char'. This is occasionally needed when the automatic type detection doesn't work correctly (e.g. copying variables from the input file that don't have an accessor defined in the algorithm).

Basic Usage

# Kinematic variables - pt varies with systematics, others don't
config.addOutputVar (self.containerName, 'pt', 'pt')
config.addOutputVar (self.containerName, 'eta', 'eta', noSys=True)
config.addOutputVar (self.containerName, 'phi', 'phi', noSys=True)
config.addOutputVar (self.containerName, 'charge', 'charge', noSys=True)

Conditional Variables

When a variable is only produced by an algorithm that runs conditionally, the addOutputVar call should be inside the same conditional:

if self.decorateExtraVariables:
    alg = config.createAlgorithm( 'CP::TauExtraVariablesAlg',
                                  'TauExtraVariablesAlg' )
    alg.taus = config.readName (self.containerName)
    config.addOutputVar (self.containerName, 'nTracksCharged', 'nTracksCharged', noSys=True)

This ensures that the output variable is defined whenever the algorithm is run.

Type Override

Use auxType when the automatic type detection doesn't produce the desired result:

config.addOutputVar (self.containerName, 'NNDecayMode', 'NNDecayMode', noSys=True, auxType='int')
config.addOutputVar (self.containerName, 'passTATTauMuonOLR', 'passTATTauMuonOLR', noSys=True, auxType='char')

Disabled Variables

Sometimes you have variables that seem useful, but only to certain users. In that case you can add them as disabled variables, meaning they won't be written out by default:

# Efficiency value (disabled by default to reduce output size)
alg.efficiencyDecorationName = f'eff_{workingPoint}_%SYS%'
config.addOutputVar(containerName, alg.efficiencyDecorationName, f'eff_{workingPoint}',
                    enabled=False)

Users can then enable these with commands if needed:

Output:
  commands:
    - enable .*specialVariable

The advantages of adding a disabled variable versus having the user add them by hand are that it is typically easier for the user to just enable the variable, and that there is a greater consistency between n-tuples of different users.

Explicit Type Specification

In some cases the automatic type detection doesn't work, in which case you will have to specify it manually:

# Force specific type
config.addOutputVar(containerName, 'muonType', 'muonType',
                    noSys=True, auxType='uint16')

Integration with Output System

The addOutputVar() method stores the variable information in the ConfigAccumulator. When the Output block runs, it retrieves all registered variables for each container specified in the containers option. The Output block then:

  1. Retrieves all registered output variables for each container
  2. Applies any commands to enable/disable/rename variables
  3. Generates branch declarations in the format: ContainerName_%SYS%.variableName -> branchName_%SYS%
  4. Handles noSys variables by replacing %SYS% with NOSYS in the branch declaration
  5. Creates the appropriate output algorithms to write the variables

Notes from the Developers

While CP Algorithms did always come with some output facilities, the original intent was to provide an algorithm sequence to populate the event store. However, we decided to make n-tuple output the primary target because we believe that for most users that's the output they want (based on what other frameworks were doing) and there are quite a few subtleties in implementing them. However, populating the event store is still a supported workflow.

This differs from the typical pattern in Athena configuration in that it collects the list of output branches as it goes through the individual configuration blocks (as opposed to a long list in the output configuration). Besides avoiding the long list in a single place, it also makes it a lot easier to adjust the list based on what algorithms get actually configured and what is available.

The reason we have the mechanism to collect aux-types during configuration and initialization is that this allows us to create exactly the same output n-tuple format, even if not a single event is processed. This can make subsequent processing more straightforward, particularly for RDataFrame users. It is also assumed that in many cases those aux-types will not have to be manually specified, e.g. systematics handles and columnar accessors will already declare the aux-types.

There is also a mechanism for providing xAOD outputs, but it is not clear if that is used by anyone, or if it is still working or providing sufficient functionality to be useful.

Ideally we wouldn't need the Thinning block in the user configuration. When we originally set up the configuration that was the easiest thing to implement, but for practical purposes it would be much nicer if the output block automatically took care of the thinning.