Columnar Tool Base¶
There is a common base class for all tools and all classes that need to
hold columnar accessors: ColumnarTool<>. The reason all of the classes
that hold accessors need to derive from it, is that all accessors need
to be tracked (and at times also updated).
So if you have a subobject that needs accessors you need to derive it
from ColumnarTool<> to be able to add them. You also need to connect
the two objects, by either passing a pointer to the other tool into the
constructor of the new tool, or by calling addSubtool(...) to add it
after it is created. The name may suggest that the tools are somehow
organized in a tree-like structure, but that is no longer the case.
Essentially the ownership structures of tools and their subobjects is at
times more complicated and doesn't allow for such a simple structure.
Instead this now just links the backends and no assumptions are made
about the organizational structure of the objects.
Note that not every object that needs accessors needs to hold them directly, you can just pass in another object that holds them. That's at the same time obvious, but also easy to forget. Use whatever best matches your current situation.
There is little overhead/penalty for declaring the same accessor more
than once. So if you feel that every subobject needs its own pt
accessor to be self-contained that is perfectly fine.
Note that in the past every subobject needed to hold its own
ObjectColumn for every single container they use. That is no longer
the case and also not recommended. You simply need it once (typically in
the top level tool) to declare the name of the container. As you link
the tool to its subtools the names get transferred as well.
Important: Objects that hold columnar accessors are incompatible with root dictionaries. Essentially the issue is that we don't have a dictionary for every single columnar accessor (and won't create them), but root requires them, even for private members that would not be accessed via dictionaries. We are working to reduce the use of dictionaries, but in the meantime the recommended workaround is to create a separate subobject that holds all the needed accessors (connected as explained above). Note that this is only really needed for the top level tool, not for subtools.
For the top level tool you will generally need to do two things:
- Call
initializeColumnsonce all column accessors have been declared and all subobjects have been connected. If you add more column accessors later on, you'll have to call it again. - Implement a
callEventsfunction that applies the tool for all events in the input column.
callEvents Function¶
To implement calls, tools must implement:
virtual void callEvents (EventContextRange events) const override;
This should then internally loop over all events and all objects in each event to do whatever the tool is meant to do. In that sense it resembles more what an algorithm than what a tool would do in Athena. However, to work effectively as a columnar tool, a tool needs to operate on entire columns not just individual objects, and that means running on an entire block of events at once.
This design also means that all columnar tools have a uniform interface
(IColumnarTool) that can then be called from the python binding layer
in a uniform way, instead of needing bespoke python bindings for every
tool. It also allows to bundle multiple simpler tools into a single
python call (like e.g. correctionlib does for efficiency).
Tool Interfaces¶
Usually each Athena tool has a dedicated interface class (IMyTool).
This is generally not used for columnar tools, so we typically leave the
existing xAOD interfaces in place, which also means we don't break
existing code using the tool. The main reason to update the interface
class is if this is a subtool to another columnar tool that directly
calls the tool through that interface.
Instead what we generally do is keep all the existing functions in place, and for the functions needed in columnar mode, we create a new function with the same name that takes columnar objects as arguments. Then the code from the old function gets moved to the new function and updated to columnar, and the old function then simply converts xAOD objects to columnar objects and call the new function internally.
When making the new function it is strongly preferred to use ObjectId
over OptObjectId, i.e. the version that guarantees the object exist. A
special shoutout here goes to tools that internally retrieve
xAOD::EventInfo, that should be retrieved in that interface function
and then passed as an argument into the columnar function. And ideally
you'd update the xAOD interface to also allow passing it in there (for
efficiency).
If you choose to add columnar functions to the interface you can either add the new columnar function in addition to the xAOD function or as a replacement for it. All the xAOD objects implicitly convert to their columnar equivalents, so using only columnar versions doesn't necessarily break xAOD interfaces (though it mandates all xAOD objects passed by reference, not pointer, which breaks some conventions).
Another reason to update the interface to columnar is if you have multiple implementations and don't want to add the same xAOD to columnar conversion to every single implementation. However, note that in that case you also require all of them to be converted to columnar, or at least require converting back to xAOD in the ones that don't.
Columnar Tool Tests¶
In general we define two tests for every tool: * An in-memory test that takes a predefined list of input and output columns and checks that the tool reproduces the output columns from the input columns. This only runs in columnar mode. * A file-based test that reads the input columns from a PHYSLITE file and tests that the tool runs on those columns. This checks that the tool works with real data and collects various performance metrics.
The exact layout of these tests (particularly the PHYSLITE tests) is a bit in flux as we extend them to support both more tools and more metrics we want to monitor.
At the very least there should at least be a single PHYSLITE test for each tool (unless the tool is not meant to run on PHYSLITE), so that we can collect performance metrics from them.
For the actual implementation please look at current tests to see what they do, and copy that or do something similar.