Callgrind
Introduction
Callgrind is a tool that uses the runtime code instrumentation framework of Valgrind for call-graph generation. Valgrind is a kind of emulator or virtual machine. It uses JIT (just-in-time) compilation techniques to translate x86 instructions to a simpler form called code on which various tools can be executed. The code processed by the tools is then translated back to the x86 instructions and executed on the host CPU. This way even shared libraries and dynamically loaded plugins can be analyzed but this kind of approach results in a huge slowdown (about 50 times for the callgrind tool) of analyzed applications and big memory consumption.
Simple use case
- Prepare your development area as usual. You need to run Athena in debug mode only if you want to get a detailed line-by-line profile. Call-graphs can be produced in optimized mode as well.
- Run Athena with Valgrind:
valgrind --tool=callgrind --trace-children=yes $(which athena.py) your_Job.py
- the
--tool
option determines which tool should be executed, callgrind in our case - the
--trace-children
option tells Valgrind to analyze child processes of the main application, otherwise you'll get only profiles of bash sessions which I think is not what you want wink - Note that depending on the system you might want to use
--enable-debuginfod=no
i.e. to avoid a service providing debug information over an HTTP API.
Profiling selected algorithms
Profiling an entire Athena job has not only the disadvantage of being
very slow but also the resulting profiles can be huge (easily more than 100MB for a few events).
This can make it difficult to analyze the results using KCacheGrind
. Moreover, the
developer might only be interested in his/her algorithm. This is where the
ValgrindAuditor (part of Control/Valkyrie) can help out. Once configured
with an algorithm name to profile it will turn the callgrind instrumentation
on before the algorithm's execute method and turn it off again afterwards. Once
the instrumentation is off the remaining valgrind overhead should only be
about a factor of 4 which makes it much easier to run an entire Athena job in a reasonable amount of time.
To enable the ValgrindAuditor add the following lines to your component accumulator job:
flags.PerfMon.Valgrind.ProfiledAlgs=["EMBremCollectionBuilder"] #EMBremCollectionBuilder is an example. Replace as appropriate
from Valkyrie.ValkyrieConfig import ValgrindServiceCfg
acc.merge(ValgrindServiceCfg(flags))
Set flags.PerfMon.Valgrind.ProfiledAlgs
to the name of the algorithm you want to
profile (you can of course add multiple algorithms). Sometimes you might want
to skip a few events before collecting profiling data, e.g. to exclude first-event initializations, ld
symbol lookups, etc. This
can be done by setting IgnoreFirstNEvents
. For a complete documentation of all
the ValgrindSvc
properties see the Valkyrie doxygen page.
Before you run your job in Valgrind you might simply want to run Athena with your modified job options. If everything works fine (and you haven't made a mistake in the algorithm name) you should see the following output
ValgrindAuditor VERBOSE Starting callgrind: EMBremCollectionBuilder [event 1]
ValgrindAuditor VERBOSE Stopping callgrind: EMBremCollectionBuilder [event 1]
valgrind --tool=callgrind --trace-children=yes --collect-jumps=yes --instr-atstart=no --enable-debuginfod=no $(which athena.py) --imf your_Job.py
An example using Reco_tf
and some more callgrind options:
InputRDOFile="/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/CampaignInputs/mc20/RDO/mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.recon.AOD.e6337_s3681_r13145/100events.RDO.pool.root"
valgrind --tool=callgrind --trace-children=yes --collect-jumps=yes --instr-atstart=no --cacheuse=yes --cache-sim=yes\
--simulate-wb=yes --simulate-hwpref=yes --branch-sim=yes --dump-instr=yes --enable-debuginfod=no $(which Reco_tf.py) \
--inputRDOFile $InputRDOFile \
--outputAODFile myAOD.pool.root \
--preInclude 'egammaConfig.egammaOnlyFromRawFlags.egammaOnlyFromRaw' \
--preExec 'ConfigFlags.PerfMon.Valgrind.ProfiledAlgs=["EMBremCollectionBuilder"]'\
--postInclude 'Valkyrie.ValkyrieConfig.ValgrindServiceCfg' \
--autoConfiguration 'everything' \
--maxEvents '20' \
--fileValidation FALSE \
--perfmon 'none'
```
The most important option is `--instr-atstart=no`. This turns the
instrumentation off at the beginning so that it can be turned on by ValgrindAuditor
on the first event of the algorithm(s) being profiled. See
the [Valgrind manual for other callgrind options](https://valgrind.org/docs/manual/cl-manual.html#cl-manual.options). After
the job is done you will find several callgrind.out files from which the largest one is again the one you are interested in.
## Profiling selected pieces of code
The above only works on Algorithms. To, for example, profile one method, you need to do a bit more:
- in the `CMakeLists.txt` file add: `find_package( valgrind )`
- in the file to be profiled, add `#include "valgrind/callgrind.h"`
- at the start of the piece of code to be profiled, add `CALLGRIND_START_INSTRUMENTATION;` and at the end add `CALLGRIND_STOP_INSTRUMENTATION;`
After compiling and setting up your code, you can run callgrind as before
`valgrind --tool=callgrind --trace-children=yes --collect-jumps=yes --instr-atstart=no --enable-debuginfod=no $(which athena.py) --imf your_Job.py`
There's more in the relevant [section](https://valgrind.org/docs/manual/cl-manual.html#cl-manual.limits) of the valgrind manual.
## `KCacheGrind`
Data produced by callgrind can be loaded into the [KCacheGrind](https://kcachegrind.github.io/html/Home.html)
tool for browsing the performance results. The actual profile of Athena
run is stored in the biggest file produced by callgrind. On
lxplus `KCacheGrind` is already installed so the user doesn't need to do anything special:
`$(which kcachegrind) --version`
KCachegrind
: version kde
```