Grid Driver
You can run your algorithm on the Grid through built-in functionality.
In a new shell, navigate to AnalysisTutorial
. Set up your analysis release (asetup
) and additionally setup panda
.
The nice thing about using EventLoop (and Athena) is that you don't
have to change any of your algorithms code, we simply change the driver
in the steering script. It is recommended that you use a separate submit
script when running on the Grid. Let's copy the content of ATestRun_eljob.py
into a new file
ATestSubmit_eljob.py
.
Tip
If you copy the file into MyAnalysis/share
, you will need to re-compile
before ATestSubmit_eljob.py
can be executed as a command.
First, we need to tell SampleHandler how to find the input file(s) on the
Grid. In ATestSubmit_eljob.py
, comment out the directory scan and instead
scan using Rucio.
Tip
Since we are just testing this functionality we will use a very small input dataset so your job will run quickly and you can have quick feedback regarding the success (let's hope it's a success!) of your job.
# sample = ROOT.SH.SampleLocal("dataset")
# sample.add (testFile)
# sh.add (sample)
ROOT.SH.scanRucio(sh , 'mc20_13TeV.312276.aMcAtNloPy8EG_A14N30NLO_LQd_mu_ld_0p3_beta_0p5_2ndG_M1000.deriv.DAOD_PHYS.e7587_a907_r14861_p6117')
Next, replace the driver with the PrunDriver
:
# driver = ROOT.EL.DirectDriver()
driver = ROOT.EL.PrunDriver()
We actually need to specify a structure for the output dataset name, as
our input sample has a really really long name, and by default the
output dataset name will contain (among other strings) this input
dataset name which is too long for the Grid to handle. So after you've
defined this PrunDriver
add:
driver.options().setString("nc_outputSampleName", "user.%nickname%.grid_test_run")
In the new file, change driver.submit()
to driver.submitOnly()
:
#driver.submit( job, options.submitDir )
driver.submitOnly( job, options.submitDir )
This should again be the last line of the file.
!!! tip "Tip"command, the script will wait until the Grid jobs are finished, which
we don't want. The
submitOnly()` command will launch the jobs and then
return control.
The PrunDriver
supports a number of optional configuration options
that you might recognize from the prun
program. If you need more
information on options available for running on the Grid, check out the
Grid driver documentation.
Tip
For example, a useful (and actually recommended) option is to set merging jobs to run on the Grid so the output is a smaller number of files. This can be important in cases you run over a large input dataset and Grid splits the processing task into multiple jobs.
driver.options().setString( "nc_mergeOutput", "true" )
Finally, submit the jobs as before:
ATestSubmit_eljob.py --config-path=../source/MyAnalysis/data/config.yaml
This job submission process may take a while to complete - do not interrupt it! If you don't yet have a proxy for authentication, you will be prompted to enter your Grid certificate password.
Tip
EventLoop
runs quite a bit of configuration before submitting
the job. If you have some configuration that depends on your input
file, make sure that it is being run in the job itself, and not in
the configuration step that EventLoop
runs before submitting things.
You'll notice that EventLoop
does not print a PanDA task ID, so you
will need to search for your tasks to find the job that was submitted.
Running with Athena¶
If you are using Athena, you can learn how to submit jobs to the Grid using
pathena
at
this link