Skip to content

CPRun.py

What does CPRun.py offers?

At this point you should have run CPRun.py with different YAML configurations. CPRun.py is the recommended tool to run CP algorithms for analyses. It is a command line tool written in Python that simplifies the process of setting up and executing analysis jobs using the ATLAS software framework. It helps analysts to focus on the analysis configuration rather than the intricacies of job configuration and execution.

This page is the tutorial and also the documentation for CPRun.py. We will explain the main features of CPRun.py and how to use them.

I/O handling

CPRun.py simplifies the input and output handling of your analysis jobs. In the simplest form, you only need to know your input files name and the analysis YAML configuration file. CPRun.py will take care of the rest.

Input (--input-list/-i) [mandatory]

--input-list or -i option takes two kinds of inputs: - A text file containing a list of input files or directories (one per line). - A path to a single file for quick tests.

An example of a text file containing a list of input files:

/afs/cern.ch/user/u/username/vbs_analysis/WZQCD/
/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/ASG/DAOD_PHYS/p6697/mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13145_r13146_p6697/DAOD_PHYS.43713321._000003.pool.root.1
All the root file in the directory WZQCD/ will be processed, and also the DAOD_PHYS.43713321._000003.pool.root.1.

Output (--output/-o)

--output or -o option specifies the output file name. If not provided, it defaults to output.root.

Analysis configuration YAML file

text config (--text-config/-t) [mandatory]

--text-config or -t option specifies the analysis configuration YAML file. This file contains all the settings for your analysis, see other pages in this tutorial for more details about the configuration options.

-t option is mandatory, you need to provide it every time you run CPRun.py. It takes three kind of inputs:

  • config.yaml, this will look for the current directory for a file named config.yaml, and also the build/analysis_package/data directory of your installed analysis package.

Tip

Use atlas_install_data(path/to/*.yaml) in CMakeLists.txt to install your YAML files to the data directory.

  • analysis_package/config.yaml, this will look for the config.yaml file in your specific analysis package directory. This is useful when you have multiple analysis packages installed as one package.
  • /full/path/to/your/config.yaml, this will look for the config.yaml file in the specified full path.

Tip

Duplicated YAML config name protection is installed.

Other useful options

--no-systematics option

Turn off the systematics, overriding the YAML configuration file setting.

--max-events/-e option

Number of events to run in one job. Default is -1, meaning all events.

Duality of Athena and EventLoop

CPRun.py supports both Athena and EventLoop execution modes automatically. Each having their specific options, see CPRun.py -h under corresponding framework.

Eventloop specific options

work directory (--work-dir)

CPRun.py output in Eventloop mimics the Athena output structure by default, where output files are created at the surface, and work directory contains only meta data. But if you want a more "authentic" EventLoop output structure, you can use --work-dir directory_name option to create a work directory. All output files will be created inside the work directory.

merging histograms and trees (--merge-output-files)

Merging histogram.root to output.root. Mimicking Athena output structure. This can be useful for downstream softwares that only expects one output file.

direct driver (--direct-driver)

Use direct driver instead of batch driver.

full configuration log (--dump-full-config)

Dump the full configuration to a text file named full_config.txt. This can be useful for debugging purposes.

Debugging tools

--run-perf-stat Run xAOD::PerfStats to get input branch access data. This is mostly useful for AMG experts wanting to understand branch access patterns. --algorithm-timers Enable algorithm timers. This is mostly useful for AMG experts wanting to understand tool performance. --algorithm-memory-monitoring Enable algorithm memory monitoring. This is mostly useful for AMG experts wanting to understand tool memory usage. Note that this is imperfect and may in cases assign memory to the wrong algorithm.

Athena specific options

As of today there's no Athena specific options in CPRun.py. Open request are welcome if you have any suggestions.

Example usage

The simplest way you can run CPRun.py is:

setupATLAS
asetup AnalysisBase,main,latest
CPRun.py -i $ASG_TEST_FILE_MC -t config.yaml
where $ASG_TEST_FILE_MC is a handy root file to do testing, and you should have your config.yaml prepared from previous pages in this tutorial. Another commonly used example for testing is :
CPRun.py -i $ASG_TEST_FILE_MC -t config.yaml -e 1000 --no-systematics
This will run only 1000 events without systematics, and run in a blazing fast speed.