Selection Handling¶
The basic idea for handling selections is that every time a new cut is applied a new selection decoration is created. Subsequent algorithms that want to use the cut result then read that decoration. View containers are not used because they can lead to very complex setups when using multiple working points, while also potentially propagating systematics to algorithms that should not be affected by them. It is however possible to load a special algorithm at the end of the sequence that creates a view container if the user needs one.
At the algorithm level the selections are represented by decorations on
the individual objects. Instead of accessing the decorations directly,
they are accessed via special selection accessors and selection handles
that allow to select the exact decoration type and to allow logical
expressions (&& and ||) when reading selections.
At the configuration level, related selection decorations are grouped
together and given a name defined by the user (e.g. tight). This name
is typically different from the name of any selection decoration, and is
what's used in the configuration file. The list of decorations belonging
to a selection will also grow as we go through the analysis sequence, as
more decorations get added that were not available to previous blocks
and algorithms.
For event selection, a similar system is used with decorations placed on
EventInfo, but with some extensions to handle e.g. aborting an event
early if it fails selection for all systematics.
Requirements¶
We need to be able to handle multiple selections for an object concurrently. This could be multiple working points (e.g. tight and loose) or changing an option on the same working point. All of the decorations should be accessible on the same container (as opposed to separate containers for each selection), to allow combining selections like "loose but not tight" objects, as well as to allow efficient storage of decorations in the output n-tuple.
Selections are generally not just the output of a single selection algorithm, but any algorithm can add a cut. This could e.g. be a calibration tool that will introduce a cut for objects outside the valid calibration range. This needs to be handled efficiently, as well as robustly handling reordering, adding and dropping algorithms.
The system needs to support both preselections and full physics selections. The full physics selection is what it says on the box, it contains all cuts defined up to that point, and is used when being exact is important, e.g. when calculating overlap removal, event level variables, or event selections. The preselection is used to skip some objects when calculating object-level variables, e.g. don't calculate scale factors outside the valid kinematic range or for objects that will never be looked at. As such the preselection is more inclusive and can still accept some objects dropped in the full selection.
For selections that depend on systematics the systematic shouldn't propagate to other algorithms via the preselection. This is mainly an issue for overlap removal, which can propagate systematics from one input container to another. If handled incorrectly this can increase both processing time and number of branches in the output n-tuple creating e.g. separate muon scale factor branches for different jet systematics.
We need to be able to exclude specific cuts from being included in one specific place the selection is used (while keeping it for all other uses). This is extremely niche, there is essentially one use case in which missing ET needs to exclude cuts from overlap removal. However, it still needs to be supported.
It should be possible to get an object-level cut-flow, showing how many
objects survive after each cut. For selection tools that return an
AcceptData this should also report all cuts inside that selection as a
separate cut.
User Configuration Level¶
At the configuration level users see selections as a simple string, usually set as an option on the block that needs it. This can be set to an empty string to turn it into a baseline selection, i.e. a selection that becomes part of every single selection. By choosing the same name in different blocks they contribute to the same selection.
It is also possible to use simple boolean logic operations for a
selection string, e.g. loose||tight or loose&&!tight.
Expression Syntax¶
The selection expression parser supports the following operators with standard precedence rules:
!(NOT) - highest precedence&&(AND) - medium precedence||(OR) - lowest precedence()- parentheses for explicit grouping
Examples of valid expressions:
tight||loose- objects passing either tight or loose selectionloose&&!tight- objects passing loose but not tight selection(loose||tight)&&!veto- objects passing loose or tight, but not the veto!(tight&&forward)- logical negation of a compound expression
The parser does not support other logical operators (e.g. XOR), variable substitution, or other advanced features. Whitespace is ignored in expressions.
Container.selection Syntax¶
There is also a syntax of Container.selection which allows to specify
both the container and the selection explicitly. This is primarily used
with event-level algorithms that operate on multiple containers, such as:
- Missing ET reconstruction (reads multiple object containers)
- Overlap removal (operates on pairs of containers)
- Event selection (may reference selections from multiple containers)
The syntax combines the container name with a selection expression:
AnaMuons.medium- medium muons from the AnaMuons containerAnaElectrons.loose- loose electrons from the AnaElectrons containerAnaMuons.medium||tight- muons passing medium or tight selection
This contrasts with object-level algorithms configured via blocks, where the container is implicit from the block's context and only the selection name needs to be specified.
Configuration Block Level¶
Within the configuration blocks the selections are usually accessed via
the ConfigAccumulator. There should generally be no hard-coded
selection names in the block itself (except for the baseline selection),
and selection names should be set via options.
The selections will generally be build up step-by-step. So if you request a selection you will only get the selections defined up to this point. As a developer it is your responsibility to ensure that any selections that have to have run before your algorithm where scheduled beforehand.
For preselections there is the getPreselection function that will give
you a string that a read selection accessor can understand. Most single
object algorithm accessors will have a preselection property to
configure it, and this can be configured like this:
alg.preselection = config.getPreselection (self.containerName, self.selectionName)
For full physics selections there are the getFullSelection and
readNameAndSelection functions. The former works like the
getPreselection function, while the later returns both the container
and selection, e.g.:
alg.muons, alg.muonsSelection = config.readNameAndSelection (self.muons)
For adding cuts to the selection you need to first configure your algorithm to add the decoration to the object and then register that decoration. If you use a write selection accessor this could look like (see explanation below):
alg.selectionDecoration = 'forwardSelection_' + self.selectionName + ',as_char'
config.addSelection (self.containerName, self.selectionName, alg.selectionDecoration, preselection=True)
The name needs to be unique for the object type you are working on, so
it should include the name of the selection (unless it is the baseline
selection), and also a unique string for your algorithm. The
preselection flag (which is optional) indicates that a selection
should be part of the preselection for subsequent algorithms.
The as_char indicates that the selection should be represented as a
char type decoration. There is a second type as_bits that represents
them as a bit mask in case it comes from a selection tool that reports
multiple cuts at once. There is also an invert suffix (e.g.
tight,invert) that applies logical negation to the selection, though
it is generally clearer to use the ! operator in expressions (e.g.
!tight) instead.
The preselection flag indicates that the selection should also be part
of the preselection. Whether this is appropriate is left to the
developer. There are some cases in which this is clear, but in many
cases it is a judgement call. However, it is important that this is
never done for selections that include systematics (except momentum
systematics on the container they are on), as that would propagate the
systematic to all algorithms using that preselection. If you want to
include such a selection in the preselection you have to create a
separate selection that is an OR over all systematics for that
selection, which is then systematics independent. This is however rarely
necessary.
The optional writeToOutput parameter controls whether the selection
decoration is included in the output n-tuple. This defaults to true, but
can be set to false for internal selections that are only needed during
processing (e.g., intermediate cuts used by other algorithms but not
interesting for analysis). Example:
config.addSelection (self.containerName, self.selectionName,
alg.selectionDecoration, writeToOutput=False)
If you need to drop specific cuts from a selection you have to specify a
label for them by adding comesFrom=... for addSelection, and then
you need to specify the set of labels via the excludeFrom={...} option
when reading the selection. That will apply all the selection cuts that
don't have any of the excluded labels. It should be noted that there are
at least two alternatives to this:
- The algorithms or blocks can be reordered so that the selection gets added after the point at which it needs to be excluded. This is a very clean solution, where possible. However, there may be good reasons not to reorder algorithms, and this is a reasonable alternative.
- Have the user create a separate selection name for the selection that
is meant to be excluded, e.g.
tightandselectOR. Then specify justtightwhere you don't want to apply OR, andtight&&selectOReverywhere else. That definitely works, but it has a moderate-level impact on the user configuration, invites mistakes, and also requires users to be aware they have to do this in the first place.
Algorithm Level¶
At the algorithm level you will generally read each selection via a
ISelectionReadAccessor, SelectionReadHandle, or
SysReadSelectionHandle; with the later two being wrappers around the
former. In most cases the best choice is the SelectionReadHandle,
which reads a simple selection without systematics and declares a
property for it. If you actually need systematics on the input selection
(for full physics selections), you have to use SysReadSelectionHandle
instead.
When selections depend on systematics, the SysReadSelectionHandle
automatically manages decoration name variants with the _%SYS% suffix
pattern (e.g. mySelection_JET_Resolution__1up). The handle internally
caches accessors per systematic set for efficient access. For details on
systematic handling, see the Systematics Handling page.
In general for an object level algorithm you will want to define a preselection, that allows upstream algorithms to skip objects. Even if you expect to run first this is considered good practice. That looks something like:
SysReadSelectionHandle m_preselection {this, "preselection", "", "the preselection to apply"};
for (xAOD::Muon *muon : *muons)
{
if (m_preselection.getBool (*muon, sys))
{
...
} else
{
...
}
}
else branch is very important. Even for skipped objects
you will generally have to write out all the decorations your algorithm
would set.
An empty preselection string (the default if not configured) means no preselection is applied - all objects pass. This is implemented via a special null accessor that always returns true, allowing algorithms to run on all objects in the container.
For writing out selections there is the ISelectionWriteAccessor,
SelectionWriteHandle, or SysWriteSelectionHandle that have analogous
functionality. Though (for boolean selections) you can also just write
out a char type decoration normally, and then register it with an
added ,as_char in the configuration. Since there are no logic
expressions when writing, a lot of the selection accessor mechanism goes
unused.
If you have a selection tool that gives you an AcceptData bit mask
described via an AcceptInfo, you will need the ISelectionNameSvc in
your algorithm:
ServiceHandle<ISelectionNameSvc> m_nameSvc {"SelectionNameSvc", "MuonSelectionAlg"};
initialize:
if (!m_nameSvc.empty())
{
ANA_CHECK (m_nameSvc.retrieve());
ANA_CHECK (m_nameSvc->addAcceptInfo (m_muonsHandle.getNamePattern(),
m_selectionHandle.getLabel(), m_selectionTool->getAcceptInfo()));
}
execute pass out the selection via setBits:
m_selectionHandle.setBits (*muon, selectionFromAccept (m_selectionTool->accept (*muon)), sys);
Notes from the Developers¶
It should be acknowledged that the selection handling as we have it today is a significant evolution of the original design to meet user requirements. As such some aspects are not how we would have build them if we started from scratch, and further redesigns may be needed in the future.
While building up selections step-by-step works great for preselections, it may make sense to lock a selection once it is used as a full physics selection to avoid ordering mistakes. This would probably need an "escape hatch" in case this is the desired behavior, but in most cases it is likely to be a bug.
It might be nice if the excludeFrom mechanism could also be expressed
through the logic expressions, e.g. specify something like
tight{exclude=OR} and then you get the tight selection without overlap
removal.
The whole as_char vs as_bits mechanism seems way too complicated,
particularly given how niche a feature it is. It would probably be a lot
better if every single selection decoration was of type char and if a
bit mask for individual cuts is needed just add a second decoration.
That would completely eliminate the need to do anything special for
writing out selections, they'd just all be written like regular char
decorations. There could still be some special helpers for handling
selections that are bit-masks, but most selections are not.
And even for reading selections for which we need to support formulas,
it may still be much more practical to treat them as simple char
decorations at the algorithm level. The idea would essentially be that
if a selection doesn't exist as a single char decoration it would be
created before the algorithm is run, so the algorithm doesn't have to
have special handling and can just read a char. While it may sound
like this would create a lot of temporaries, in many cases the
decoration would exist already:
- For preselection decorations whenever a new preselection is created the previous preselection will already be integrated (as it auto-fails for all preselection fails).
- For full selections we will typically have to create the selection decorations at some point anyways, to write it to disk.
- The main exception would be logic expressions (e.g.
loose||tight) which would have to be evaluated and decorated on the object. Same for selections that exclude a cut, etc.
For moving to columnar we would likely have to move to a model of
reading/writing char to avoid overly complex logic in the tool. If
anything we may need some special handling for bit fields for selections
in columnar mode, but that would likely be absorbed in the columnar
infrastructure.