Selection Handling¶

The basic idea for handling selections is that every time a new cut is applied a new selection decoration is created. Subsequent algorithms that want to use the cut result then read that decoration. View containers are not used because they can lead to very complex setups when using multiple working points, while also potentially propagating systematics to algorithms that should not be affected by them. It is however possible to load a special algorithm at the end of the sequence that creates a view container if the user needs one.

At the algorithm level the selections are represented by decorations on the individual objects. Instead of accessing the decorations directly, they are accessed via special selection accessors and selection handles that allow to select the exact decoration type and to allow logical expressions (&& and ||) when reading selections.

At the configuration level, related selection decorations are grouped together and given a name defined by the user (e.g. tight). This name is typically different from the name of any selection decoration, and is what's used in the configuration file. The list of decorations belonging to a selection will also grow as we go through the analysis sequence, as more decorations get added that were not available to previous blocks and algorithms.

For event selection, a similar system is used with decorations placed on EventInfo, but with some extensions to handle e.g. aborting an event early if it fails selection for all systematics.

Requirements¶

We need to be able to handle multiple selections for an object concurrently. This could be multiple working points (e.g. tight and loose) or changing an option on the same working point. All of the decorations should be accessible on the same container (as opposed to separate containers for each selection), to allow combining selections like "loose but not tight" objects, as well as to allow efficient storage of decorations in the output n-tuple.

Selections are generally not just the output of a single selection algorithm, but any algorithm can add a cut. This could e.g. be a calibration tool that will introduce a cut for objects outside the valid calibration range. This needs to be handled efficiently, as well as robustly handling reordering, adding and dropping algorithms.

The system needs to support both preselections and full physics selections. The full physics selection is what it says on the box, it contains all cuts defined up to that point, and is used when being exact is important, e.g. when calculating overlap removal, event level variables, or event selections. The preselection is used to skip some objects when calculating object-level variables, e.g. don't calculate scale factors outside the valid kinematic range or for objects that will never be looked at. As such the preselection is more inclusive and can still accept some objects dropped in the full selection.

For selections that depend on systematics the systematic shouldn't propagate to other algorithms via the preselection. This is mainly an issue for overlap removal, which can propagate systematics from one input container to another. If handled incorrectly this can increase both processing time and number of branches in the output n-tuple creating e.g. separate muon scale factor branches for different jet systematics.

We need to be able to exclude specific cuts from being included in one specific place the selection is used (while keeping it for all other uses). This is extremely niche, there is essentially one use case in which missing ET needs to exclude cuts from overlap removal. However, it still needs to be supported.

It should be possible to get an object-level cut-flow, showing how many objects survive after each cut. For selection tools that return an AcceptData this should also report all cuts inside that selection as a separate cut.

User Configuration Level¶

At the configuration level users see selections as a simple string, usually set as an option on the block that needs it. This can be set to an empty string to turn it into a baseline selection, i.e. a selection that becomes part of every single selection. By choosing the same name in different blocks they contribute to the same selection.

It is also possible to use simple boolean logic operations for a selection string, e.g. loose||tight or loose&&!tight.

Expression Syntax¶

The selection expression parser supports the following operators with standard precedence rules:

! (NOT) - highest precedence
&& (AND) - medium precedence
|| (OR) - lowest precedence
() - parentheses for explicit grouping

Examples of valid expressions:

tight||loose - objects passing either tight or loose selection
loose&&!tight - objects passing loose but not tight selection
(loose||tight)&&!veto - objects passing loose or tight, but not the veto
!(tight&&forward) - logical negation of a compound expression

The parser does not support other logical operators (e.g. XOR), variable substitution, or other advanced features. Whitespace is ignored in expressions.

Container.selection Syntax¶

There is also a syntax of Container.selection which allows to specify both the container and the selection explicitly. This is primarily used with event-level algorithms that operate on multiple containers, such as:

Missing ET reconstruction (reads multiple object containers)
Overlap removal (operates on pairs of containers)
Event selection (may reference selections from multiple containers)

The syntax combines the container name with a selection expression:

AnaMuons.medium - medium muons from the AnaMuons container
AnaElectrons.loose - loose electrons from the AnaElectrons container
AnaMuons.medium||tight - muons passing medium or tight selection

This contrasts with object-level algorithms configured via blocks, where the container is implicit from the block's context and only the selection name needs to be specified.

Configuration Block Level¶

Within the configuration blocks the selections are usually accessed via the ConfigAccumulator. There should generally be no hard-coded selection names in the block itself (except for the baseline selection), and selection names should be set via options.

The selections will generally be build up step-by-step. So if you request a selection you will only get the selections defined up to this point. As a developer it is your responsibility to ensure that any selections that have to have run before your algorithm where scheduled beforehand.

For preselections there is the getPreselection function that will give you a string that a read selection accessor can understand. Most single object algorithm accessors will have a preselection property to configure it, and this can be configured like this:

alg.preselection = config.getPreselection (self.containerName, self.selectionName)

For full physics selections there are the getFullSelection and readNameAndSelection functions. The former works like the getPreselection function, while the later returns both the container and selection, e.g.:

alg.muons, alg.muonsSelection = config.readNameAndSelection (self.muons)

For adding cuts to the selection you need to first configure your algorithm to add the decoration to the object and then register that decoration. If you use a write selection accessor this could look like (see explanation below):

alg.selectionDecoration = 'forwardSelection_' + self.selectionName + ',as_char'
config.addSelection (self.containerName, self.selectionName, alg.selectionDecoration, preselection=True)

The name needs to be unique for the object type you are working on, so it should include the name of the selection (unless it is the baseline selection), and also a unique string for your algorithm. The preselection flag (which is optional) indicates that a selection should be part of the preselection for subsequent algorithms.

The as_char indicates that the selection should be represented as a char type decoration. There is a second type as_bits that represents them as a bit mask in case it comes from a selection tool that reports multiple cuts at once. There is also an invert suffix (e.g. tight,invert) that applies logical negation to the selection, though it is generally clearer to use the ! operator in expressions (e.g. !tight) instead.

The preselection flag indicates that the selection should also be part of the preselection. Whether this is appropriate is left to the developer. There are some cases in which this is clear, but in many cases it is a judgement call. However, it is important that this is never done for selections that include systematics (except momentum systematics on the container they are on), as that would propagate the systematic to all algorithms using that preselection. If you want to include such a selection in the preselection you have to create a separate selection that is an OR over all systematics for that selection, which is then systematics independent. This is however rarely necessary.

The optional writeToOutput parameter controls whether the selection decoration is included in the output n-tuple. This defaults to true, but can be set to false for internal selections that are only needed during processing (e.g., intermediate cuts used by other algorithms but not interesting for analysis). Example:

config.addSelection (self.containerName, self.selectionName,
                     alg.selectionDecoration, writeToOutput=False)

If you need to drop specific cuts from a selection you have to specify a label for them by adding comesFrom=... for addSelection, and then you need to specify the set of labels via the excludeFrom={...} option when reading the selection. That will apply all the selection cuts that don't have any of the excluded labels. It should be noted that there are at least two alternatives to this:

The algorithms or blocks can be reordered so that the selection gets added after the point at which it needs to be excluded. This is a very clean solution, where possible. However, there may be good reasons not to reorder algorithms, and this is a reasonable alternative.
Have the user create a separate selection name for the selection that is meant to be excluded, e.g. tight and selectOR. Then specify just tight where you don't want to apply OR, and tight&&selectOR everywhere else. That definitely works, but it has a moderate-level impact on the user configuration, invites mistakes, and also requires users to be aware they have to do this in the first place.

Algorithm Level¶

At the algorithm level you will generally read each selection via a ISelectionReadAccessor, SelectionReadHandle, or SysReadSelectionHandle; with the later two being wrappers around the former. In most cases the best choice is the SelectionReadHandle, which reads a simple selection without systematics and declares a property for it. If you actually need systematics on the input selection (for full physics selections), you have to use SysReadSelectionHandle instead.

When selections depend on systematics, the SysReadSelectionHandle automatically manages decoration name variants with the _%SYS% suffix pattern (e.g. mySelection_JET_Resolution__1up). The handle internally caches accessors per systematic set for efficient access. For details on systematic handling, see the Systematics Handling page.

In general for an object level algorithm you will want to define a preselection, that allows upstream algorithms to skip objects. Even if you expect to run first this is considered good practice. That looks something like:

SysReadSelectionHandle m_preselection {this, "preselection", "", "the preselection to apply"};

And then in the code you might use it like:

for (xAOD::Muon *muon : *muons)
{
  if (m_preselection.getBool (*muon, sys))
  {
    ...
  } else
  {
    ...
  }
}

Note that the else branch is very important. Even for skipped objects you will generally have to write out all the decorations your algorithm would set.

An empty preselection string (the default if not configured) means no preselection is applied - all objects pass. This is implemented via a special null accessor that always returns true, allowing algorithms to run on all objects in the container.

For writing out selections there is the ISelectionWriteAccessor, SelectionWriteHandle, or SysWriteSelectionHandle that have analogous functionality. Though (for boolean selections) you can also just write out a char type decoration normally, and then register it with an added ,as_char in the configuration. Since there are no logic expressions when writing, a lot of the selection accessor mechanism goes unused.

If you have a selection tool that gives you an AcceptData bit mask described via an AcceptInfo, you will need the ISelectionNameSvc in your algorithm:

ServiceHandle<ISelectionNameSvc> m_nameSvc {"SelectionNameSvc", "MuonSelectionAlg"};

and then pass the container and selection decoration name into that service in initialize:

if (!m_nameSvc.empty())
{
  ANA_CHECK (m_nameSvc.retrieve());
  ANA_CHECK (m_nameSvc->addAcceptInfo (m_muonsHandle.getNamePattern(),
    m_selectionHandle.getLabel(), m_selectionTool->getAcceptInfo()));
}

and then in execute pass out the selection via setBits:

m_selectionHandle.setBits (*muon, selectionFromAccept (m_selectionTool->accept (*muon)), sys);

Notes from the Developers¶

It should be acknowledged that the selection handling as we have it today is a significant evolution of the original design to meet user requirements. As such some aspects are not how we would have build them if we started from scratch, and further redesigns may be needed in the future.

While building up selections step-by-step works great for preselections, it may make sense to lock a selection once it is used as a full physics selection to avoid ordering mistakes. This would probably need an "escape hatch" in case this is the desired behavior, but in most cases it is likely to be a bug.

It might be nice if the excludeFrom mechanism could also be expressed through the logic expressions, e.g. specify something like tight{exclude=OR} and then you get the tight selection without overlap removal.

The whole as_char vs as_bits mechanism seems way too complicated, particularly given how niche a feature it is. It would probably be a lot better if every single selection decoration was of type char and if a bit mask for individual cuts is needed just add a second decoration. That would completely eliminate the need to do anything special for writing out selections, they'd just all be written like regular char decorations. There could still be some special helpers for handling selections that are bit-masks, but most selections are not.

And even for reading selections for which we need to support formulas, it may still be much more practical to treat them as simple char decorations at the algorithm level. The idea would essentially be that if a selection doesn't exist as a single char decoration it would be created before the algorithm is run, so the algorithm doesn't have to have special handling and can just read a char. While it may sound like this would create a lot of temporaries, in many cases the decoration would exist already:

For preselection decorations whenever a new preselection is created the previous preselection will already be integrated (as it auto-fails for all preselection fails).
For full selections we will typically have to create the selection decorations at some point anyways, to write it to disk.
The main exception would be logic expressions (e.g. loose||tight) which would have to be evaluated and decorated on the object. Same for selections that exclude a cut, etc.

For moving to columnar we would likely have to move to a model of reading/writing char to avoid overly complex logic in the tool. If anything we may need some special handling for bit fields for selections in columnar mode, but that would likely be absorbed in the columnar infrastructure.