Skip to content

PerfMonMT

Introduction

The PerfMonMTSvc is a thread-safe service providing access to custom-built tools for AthenaMT. It uses various Gaudi/Athena hooks to collect resource usage information and provides information on various levels:

  • Summary statistics: Maximum memory usage, throughput, etc.
  • Component statistics: CPU/wall-time and memory usage per component
  • Event statistics: Resource usage evolution within the event loop

During job configuration, users can assign a group of algorithms to specific domains that are then used in the SPOT daily tests when making resource usage plots. Since the SPOT component plots use only approximately ten of the most expensive algorithms per plot, the increase of resource usage in a specific domain helps shifters identify where problems appear.

Using PerfMonMT

For transform jobs, the perfmon options are already integrated and can be easily enabled with the --perfmon option (values: none, fastmonmt [default], fullmonmt). For example:

ATHENA_CORE_NUMBER=8 Reco_tf.py \
    --CA \
    --AMI q445 \
    --maxEvents 100 \
    --multithreaded="True" \
    --conditionsTag "$(python -c 'from AthenaConfiguration.TestDefaults import defaultConditionsTags; print(defaultConditionsTags.RUN3_MC)')" \
    --outputAODFile myAOD.MT.pool.root \
    --perfmon fullmonmt

For user-defined job options, you can enable PerfMonMT by setting the appropriate flags and merging the service configuration:

# Set the necessary flags
flags.PerfMon.doFastMonMT = True
flags.PerfMon.OutputJSON = 'perfmonmt_test.json'
# Merge the relevant service configuration
from PerfMonComps.PerfMonCompsConfig import PerfMonMTSvcCfg
acc.merge(PerfMonMTSvcCfg(flags))

Note

If you are using MainServicesCfg, PerfMonMTSvcCfg is automatically included and you do not need to manually merge it as shown above.

The example above enables the default fast monitoring mode, which outputs the following information:

  • Resource usage when event N is in flight
  • Changes in resource usage between certain points
  • Summary statistics (throughput, etc.)
  • Machine and environment information

An example log output is shown below:

PerfMonMTSvc   INFO =======================================================================================
PerfMonMTSvc   INFO                                 Event Level Monitoring
PerfMonMTSvc   INFO         (Only the first 10 and the last measurements are explicitly printed)
PerfMonMTSvc   INFO =======================================================================================
PerfMonMTSvc   INFO Event           CPU [s]     Wall [s]    Vmem [kB]   Rss [kB]    Pss [kB]    Swap [kB]
PerfMonMTSvc   INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc   INFO 1               274.73      283.32      5852376     4736336     4733328     0
PerfMonMTSvc   INFO 2               423.40      350.92      8640184     7461984     7458990     0
PerfMonMTSvc   INFO 25              655.36      380.95      10915780    9767124     9764130     0
PerfMonMTSvc   INFO 45              903.57      411.86      11189092    10069572    10066578    0
PerfMonMTSvc   INFO 68              1160.43     443.85      11237524    10109504    10106510    0
PerfMonMTSvc   INFO 92              1401.89     473.95      11243396    10121236    10118229    0
PerfMonMTSvc   INFO 111             1651.85     506.07      11257936    10158256    10155259    0
PerfMonMTSvc   INFO 133             1915.59     539.51      11275056    10175780    10172782    0
PerfMonMTSvc   INFO 153             2157.11     569.74      11431616    10327076    10324078    0
PerfMonMTSvc   INFO 177             2423.99     602.96      11447456    10333556    10330558    0                                         
...
PerfMonMTSvc   INFO 982             12893.39    1927.34     12208048    11083644    11080707    0
INFO =======================================================================================
PerfMonMTSvc   INFO                                 Snapshots Summary
PerfMonMTSvc   INFO =======================================================================================
PerfMonMTSvc   INFO Step         dCPU [s]    dWall [s]   <CPU>  dVmem [kB] dRss [kB]  dPss [kB]  dSwap [kB]
PerfMonMTSvc   INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc   INFO Configure    119.33      123.739     0.96   1910384    1418476    1415640    0
PerfMonMTSvc   INFO Initialize   134.65      136.647     0.99   2485500    1916532    1916368    0
PerfMonMTSvc   INFO FirstEvent   163.66      83.894      1.95   3907400    3751616    3751632    0
PerfMonMTSvc   INFO Execute      12775.7     1620.08     7.89   3567864    3648092    3648147    0
PerfMonMTSvc   INFO Finalize     92.69       91.05       1.02   259976     -4910788   -4910442   0
PerfMonMTSvc   INFO ***************************************************************************************
PerfMonMTSvc   INFO Number of events processed:        1000
PerfMonMTSvc   INFO CPU usage per event [ms]:          12939
PerfMonMTSvc   INFO Events per second:                 0.587
PerfMonMTSvc   INFO CPU utilization efficiency [%]:    99
INFO***************************************************************************************
PerfMonMTSvc   INFO Max Vmem:                          11.64 GB
PerfMonMTSvc   INFO Max Rss:                           10.59 GB
PerfMonMTSvc   INFO Max Pss:                           10.59 GB
PerfMonMTSvc   INFO Max Swap:                          0.00 KB   
INFO ***************************************************************************************
PerfMonMTSvc   INFO Leak estimate per event Vmem:      1.08 MB
PerfMonMTSvc   INFO Leak estimate per event Pss:       1.05 MB
PerfMonMTSvc   INFO   >> Estimated using the last 37 measurements from the Event Level Monitoring
PerfMonMTSvc   INFO   >> Events prior to the first 300 are omitted
...
INFO =======================================================================================
PerfMonMTSvc   INFO                                   System Information
INFO =======================================================================================
PerfMonMTSvc   INFO CPU Model:                         AMD EPYC 7302 16-Core Processor 512 KB
PerfMonMTSvc   INFO Number of Available Cores:         32
PerfMonMTSvc   INFO Total Memory:                      251.67 GB
INFO =======================================================================================
PerfMonMTSvc   INFO                                Environment Information
INFO =======================================================================================
PerfMonMTSvc  INFO Malloc Library:                    libtcmalloc_minimal.so
PerfMonMTSvc  INFO Math Library:                      libimf.so

More detailed information is available in the fullmonmt mode. In addition to the fastmonmt information, it contains component-level metrics for:

  • Initialize, FirstEvent, Execute, Finalize, Callbacks, preLoadProxy
PerfMonMTSvc     INFO =======================================================================================
PerfMonMTSvc     INFO                                  PerfMonMTSvc Report
PerfMonMTSvc     INFO 
PerfMonMTSvc     INFO                              Component Level Monitoring
PerfMonMTSvc     INFO =======================================================================================
PerfMonMTSvc     INFO Step           Count     CPU Time [ms]  Vmem [kB]      Malloc [kB]         Component
PerfMonMTSvc     INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc     INFO Initialize     1         44376.84       0              153893              AthMonSeq_TrigJetMonitorAlgorithm
PerfMonMTSvc     INFO Initialize     1         9740.13        39200          86544               AthMonSeq_TrigEgammaAthMonitorCfg
PerfMonMTSvc     INFO Initialize     1         8956.97        17408          17920               AthMonSeq_JetMonitoring                                           
...
INFO=======================================================================================
PerfMonMTSvc     INFO FirstEvent     1         37515.43       0              0                   CondInputLoader
PerfMonMTSvc     INFO FirstEvent     1         16442.71       0              0                   NswCalibDbAlg
PerfMonMTSvc     INFO FirstEvent     1         8017.96        0              0                   TrigDeserialiser
PerfMonMTSvc     INFO FirstEvent     1         4463.59        0              0                   MuonDetectorCondAlg
INFO =======================================================================================
PerfMonMTSvc     INFO Execute        999       1136392.37     0              0                   InDetSiSpTrackFinder
PerfMonMTSvc     INFO Execute        999       1034468.73     0              0                   InDetSiSpTrackFinderR3LargeD0
PerfMonMTSvc     INFO Execute        999       958990.57      0              0                   InDetAmbiguitySolver                                              
...
INFO=======================================================================================
PerfMonMTSvc    INFO Finalize       1         53148.94       0              -125681             ToolSvc
PerfMonMTSvc    INFO Finalize       1         23012.33       0              -3176               HLTDecodingSeq
PerfMonMTSvc    INFO Finalize       1         23012.14       0              -3176               TrigDeserialiser                                             
...
INFO=======================================================================================
PerfMonMTSvc   INFO preLoadProxy   1         15081.71       0              0                   loadCachesOverhead:COOLONL_TRIGGER/CONDBR2
PerfMonMTSvc   INFO preLoadProxy   1         12527.71       0              0                   DetCondKeyTrans[/LAR/Align]
PerfMonMTSvc   INFO preLoadProxy   1         12527.69       0              0                   UpdateAddr::/LAR/Align                                              
...
INFO=======================================================================================
PerfMonMTSvc  INFO Callback       1         57.50          0              0                   TRT_DetectorTool[0x3a859da8]+e9
PerfMonMTSvc  INFO Callback       1         42.41          0              0                   SCT_DetectorTool[0x40abd5b0]+e9
PerfMonMTSvc  INFO Callback       1         36.51          0              0                   PixelDetectorTool[0x40abe670]+e9
...

Note

Component measurements are collected without locks (no synchronization) and metrics are aggregated across all calls. This means that vmem/malloc metrics in the event loop are only available in serial mode (--threads=1).

Analyzing PerfMon Output

By default, the information is collected in the perfmonmt_jobname.json.tar.gz file, which can be analyzed with scripts provided by PerfMonComps:

  • perfmonmt-plotter.py - Creates event/component level plots
  • perfmonmt-printer.py - Prints untrimmed/ordered usage statistics
  • perfmonmt-refit.py - Re-fits memory usage in different event slices

Here is an example of snapshot metrics from perfmonmt-plotter.py:

PerfMonMT memory snapshot

Configuring Domains in Component Accumulator

New algorithms can be added to a PerfMon domain using the following example:

# Muon
acc.flagPerfmonDomain('Muon')
if flags.Detector.EnableMuon:
    from MuonConfig.MuonReconstructionConfig import MuonReconstructionCfg
    acc.merge(MuonReconstructionCfg(flags))
    log.info("---------- Configured muon tracking")

# EGamma
acc.flagPerfmonDomain('EGamma')
if flags.Reco.EnableEgamma:
    from egammaConfig.egammaSteeringConfig import EGammaSteeringCfg
    acc.merge(EGammaSteeringCfg(flags))
    log.info("---------- Configured e/gamma")

# Caching of CaloExtension for downstream
# Combined Performance algorithms
acc.flagPerfmonDomain('CaloExtension')
if flags.Reco.EnableCaloExtension:
    from TrackToCalo.CaloExtensionBuilderAlgCfg import CaloExtensionBuilderCfg
    acc.merge(CaloExtensionBuilderCfg(flags))
    log.info("---------- Configured track calorimeter extension builder")

# Muon Combined
acc.flagPerfmonDomain('CombinedMuon')
if flags.Reco.EnableCombinedMuon:
    from MuonCombinedConfig.MuonCombinedReconstructionConfig import (
        MuonCombinedReconstructionCfg)
    acc.merge(MuonCombinedReconstructionCfg(flags))
    log.info("---------- Configured combined muon reconstruction")

The configured domains are listed in the log:

Py:ComponentAccumulator INFO :: This CA contains the following PerfMon domains ::
Py:ComponentAccumulator INFO :: There are a total of 626 registered algorithms in 22 domains ::