PerfMonMT¶
Introduction¶
The PerfMonMTSvc is a thread-safe service providing access to custom-built tools for AthenaMT. It uses various Gaudi/Athena hooks to collect resource usage information and provides information on various levels:
- Summary statistics: Maximum memory usage, throughput, etc.
- Component statistics: CPU/wall-time and memory usage per component
- Event statistics: Resource usage evolution within the event loop
During job configuration, users can assign a group of algorithms to specific domains that are then used in the SPOT daily tests when making resource usage plots. Since the SPOT component plots use only approximately ten of the most expensive algorithms per plot, the increase of resource usage in a specific domain helps shifters identify where problems appear.
Using PerfMonMT¶
For transform jobs, the perfmon options are already integrated and can be easily enabled with the --perfmon option (values: none, fastmonmt [default], fullmonmt). For example:
ATHENA_CORE_NUMBER=8 Reco_tf.py \
--CA \
--AMI q445 \
--maxEvents 100 \
--multithreaded="True" \
--conditionsTag "$(python -c 'from AthenaConfiguration.TestDefaults import defaultConditionsTags; print(defaultConditionsTags.RUN3_MC)')" \
--outputAODFile myAOD.MT.pool.root \
--perfmon fullmonmt
For user-defined job options, you can enable PerfMonMT by setting the appropriate flags and merging the service configuration:
# Set the necessary flags
flags.PerfMon.doFastMonMT = True
flags.PerfMon.OutputJSON = 'perfmonmt_test.json'
# Merge the relevant service configuration
from PerfMonComps.PerfMonCompsConfig import PerfMonMTSvcCfg
acc.merge(PerfMonMTSvcCfg(flags))
Note
If you are using MainServicesCfg, PerfMonMTSvcCfg is automatically included and you do not need to manually merge it as shown above.
The example above enables the default fast monitoring mode, which outputs the following information:
- Resource usage when event N is in flight
- Changes in resource usage between certain points
- Summary statistics (throughput, etc.)
- Machine and environment information
An example log output is shown below:
PerfMonMTSvc INFO =======================================================================================
PerfMonMTSvc INFO Event Level Monitoring
PerfMonMTSvc INFO (Only the first 10 and the last measurements are explicitly printed)
PerfMonMTSvc INFO =======================================================================================
PerfMonMTSvc INFO Event CPU [s] Wall [s] Vmem [kB] Rss [kB] Pss [kB] Swap [kB]
PerfMonMTSvc INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc INFO 1 274.73 283.32 5852376 4736336 4733328 0
PerfMonMTSvc INFO 2 423.40 350.92 8640184 7461984 7458990 0
PerfMonMTSvc INFO 25 655.36 380.95 10915780 9767124 9764130 0
PerfMonMTSvc INFO 45 903.57 411.86 11189092 10069572 10066578 0
PerfMonMTSvc INFO 68 1160.43 443.85 11237524 10109504 10106510 0
PerfMonMTSvc INFO 92 1401.89 473.95 11243396 10121236 10118229 0
PerfMonMTSvc INFO 111 1651.85 506.07 11257936 10158256 10155259 0
PerfMonMTSvc INFO 133 1915.59 539.51 11275056 10175780 10172782 0
PerfMonMTSvc INFO 153 2157.11 569.74 11431616 10327076 10324078 0
PerfMonMTSvc INFO 177 2423.99 602.96 11447456 10333556 10330558 0
...
PerfMonMTSvc INFO 982 12893.39 1927.34 12208048 11083644 11080707 0
INFO =======================================================================================
PerfMonMTSvc INFO Snapshots Summary
PerfMonMTSvc INFO =======================================================================================
PerfMonMTSvc INFO Step dCPU [s] dWall [s] <CPU> dVmem [kB] dRss [kB] dPss [kB] dSwap [kB]
PerfMonMTSvc INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc INFO Configure 119.33 123.739 0.96 1910384 1418476 1415640 0
PerfMonMTSvc INFO Initialize 134.65 136.647 0.99 2485500 1916532 1916368 0
PerfMonMTSvc INFO FirstEvent 163.66 83.894 1.95 3907400 3751616 3751632 0
PerfMonMTSvc INFO Execute 12775.7 1620.08 7.89 3567864 3648092 3648147 0
PerfMonMTSvc INFO Finalize 92.69 91.05 1.02 259976 -4910788 -4910442 0
PerfMonMTSvc INFO ***************************************************************************************
PerfMonMTSvc INFO Number of events processed: 1000
PerfMonMTSvc INFO CPU usage per event [ms]: 12939
PerfMonMTSvc INFO Events per second: 0.587
PerfMonMTSvc INFO CPU utilization efficiency [%]: 99
INFO***************************************************************************************
PerfMonMTSvc INFO Max Vmem: 11.64 GB
PerfMonMTSvc INFO Max Rss: 10.59 GB
PerfMonMTSvc INFO Max Pss: 10.59 GB
PerfMonMTSvc INFO Max Swap: 0.00 KB
INFO ***************************************************************************************
PerfMonMTSvc INFO Leak estimate per event Vmem: 1.08 MB
PerfMonMTSvc INFO Leak estimate per event Pss: 1.05 MB
PerfMonMTSvc INFO >> Estimated using the last 37 measurements from the Event Level Monitoring
PerfMonMTSvc INFO >> Events prior to the first 300 are omitted
...
INFO =======================================================================================
PerfMonMTSvc INFO System Information
INFO =======================================================================================
PerfMonMTSvc INFO CPU Model: AMD EPYC 7302 16-Core Processor 512 KB
PerfMonMTSvc INFO Number of Available Cores: 32
PerfMonMTSvc INFO Total Memory: 251.67 GB
INFO =======================================================================================
PerfMonMTSvc INFO Environment Information
INFO =======================================================================================
PerfMonMTSvc INFO Malloc Library: libtcmalloc_minimal.so
PerfMonMTSvc INFO Math Library: libimf.so
More detailed information is available in the fullmonmt mode. In addition to the fastmonmt information, it contains component-level metrics for:
Initialize,FirstEvent,Execute,Finalize,Callbacks,preLoadProxy
PerfMonMTSvc INFO =======================================================================================
PerfMonMTSvc INFO PerfMonMTSvc Report
PerfMonMTSvc INFO
PerfMonMTSvc INFO Component Level Monitoring
PerfMonMTSvc INFO =======================================================================================
PerfMonMTSvc INFO Step Count CPU Time [ms] Vmem [kB] Malloc [kB] Component
PerfMonMTSvc INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc INFO Initialize 1 44376.84 0 153893 AthMonSeq_TrigJetMonitorAlgorithm
PerfMonMTSvc INFO Initialize 1 9740.13 39200 86544 AthMonSeq_TrigEgammaAthMonitorCfg
PerfMonMTSvc INFO Initialize 1 8956.97 17408 17920 AthMonSeq_JetMonitoring
...
INFO=======================================================================================
PerfMonMTSvc INFO FirstEvent 1 37515.43 0 0 CondInputLoader
PerfMonMTSvc INFO FirstEvent 1 16442.71 0 0 NswCalibDbAlg
PerfMonMTSvc INFO FirstEvent 1 8017.96 0 0 TrigDeserialiser
PerfMonMTSvc INFO FirstEvent 1 4463.59 0 0 MuonDetectorCondAlg
INFO =======================================================================================
PerfMonMTSvc INFO Execute 999 1136392.37 0 0 InDetSiSpTrackFinder
PerfMonMTSvc INFO Execute 999 1034468.73 0 0 InDetSiSpTrackFinderR3LargeD0
PerfMonMTSvc INFO Execute 999 958990.57 0 0 InDetAmbiguitySolver
...
INFO=======================================================================================
PerfMonMTSvc INFO Finalize 1 53148.94 0 -125681 ToolSvc
PerfMonMTSvc INFO Finalize 1 23012.33 0 -3176 HLTDecodingSeq
PerfMonMTSvc INFO Finalize 1 23012.14 0 -3176 TrigDeserialiser
...
INFO=======================================================================================
PerfMonMTSvc INFO preLoadProxy 1 15081.71 0 0 loadCachesOverhead:COOLONL_TRIGGER/CONDBR2
PerfMonMTSvc INFO preLoadProxy 1 12527.71 0 0 DetCondKeyTrans[/LAR/Align]
PerfMonMTSvc INFO preLoadProxy 1 12527.69 0 0 UpdateAddr::/LAR/Align
...
INFO=======================================================================================
PerfMonMTSvc INFO Callback 1 57.50 0 0 TRT_DetectorTool[0x3a859da8]+e9
PerfMonMTSvc INFO Callback 1 42.41 0 0 SCT_DetectorTool[0x40abd5b0]+e9
PerfMonMTSvc INFO Callback 1 36.51 0 0 PixelDetectorTool[0x40abe670]+e9
...
Note
Component measurements are collected without locks (no synchronization) and metrics are aggregated across all calls. This means that vmem/malloc metrics in the event loop are only available in serial mode (--threads=1).
Analyzing PerfMon Output¶
By default, the information is collected in the perfmonmt_jobname.json.tar.gz file, which can be analyzed with scripts provided by PerfMonComps:
perfmonmt-plotter.py- Creates event/component level plotsperfmonmt-printer.py- Prints untrimmed/ordered usage statisticsperfmonmt-refit.py- Re-fits memory usage in different event slices
Here is an example of snapshot metrics from perfmonmt-plotter.py:

Configuring Domains in Component Accumulator¶
New algorithms can be added to a PerfMon domain using the following example:
# Muon
acc.flagPerfmonDomain('Muon')
if flags.Detector.EnableMuon:
from MuonConfig.MuonReconstructionConfig import MuonReconstructionCfg
acc.merge(MuonReconstructionCfg(flags))
log.info("---------- Configured muon tracking")
# EGamma
acc.flagPerfmonDomain('EGamma')
if flags.Reco.EnableEgamma:
from egammaConfig.egammaSteeringConfig import EGammaSteeringCfg
acc.merge(EGammaSteeringCfg(flags))
log.info("---------- Configured e/gamma")
# Caching of CaloExtension for downstream
# Combined Performance algorithms
acc.flagPerfmonDomain('CaloExtension')
if flags.Reco.EnableCaloExtension:
from TrackToCalo.CaloExtensionBuilderAlgCfg import CaloExtensionBuilderCfg
acc.merge(CaloExtensionBuilderCfg(flags))
log.info("---------- Configured track calorimeter extension builder")
# Muon Combined
acc.flagPerfmonDomain('CombinedMuon')
if flags.Reco.EnableCombinedMuon:
from MuonCombinedConfig.MuonCombinedReconstructionConfig import (
MuonCombinedReconstructionCfg)
acc.merge(MuonCombinedReconstructionCfg(flags))
log.info("---------- Configured combined muon reconstruction")
The configured domains are listed in the log:
Py:ComponentAccumulator INFO :: This CA contains the following PerfMon domains ::
Py:ComponentAccumulator INFO :: There are a total of 626 registered algorithms in 22 domains ::