Skip to content

Commit

Permalink
DAS-ize episodes
Browse files Browse the repository at this point in the history
  • Loading branch information
jmhogan committed Dec 16, 2024
1 parent e0f4707 commit a7592d3
Show file tree
Hide file tree
Showing 8 changed files with 45 additions and 143 deletions.
40 changes: 4 additions & 36 deletions episodes/02-cpp-hello-world.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,44 +16,12 @@ exercises: 10

::::::::::::::::::::::::::::::::::::::::::::::::

## Setting up your working area

If you completed the [Docker pre-exercises](https://cms-opendata-workshop.github.io/workshopwhepp-lesson-docker/)
you should already have worked through
[this episode](https://cms-opendata-workshop.github.io/workshopwhepp-lesson-docker/03-docker-for-cms-opendata/index.html), under **Download the docker images for ROOT and python tools and start container**, and you will have

- a working directory `cms_open_data_root` on your local computer
- a docker container with name `my_root` created with the working directory `cms_open_data_root` mounted into the `/code` directory of the container.

Start your ROOT container with

```bash
docker start -i my_root
```

In the container, you will be in the `/code` directory and it shares the files with your local `cms_open_data_root` directory.

:::::::::::: callout

## If you're using apptainer:

Whenever you see a `docker start` instruction, replace it with `apptainer shell` to open either the ROOT or Python container image.
The specific commands in this pre-exercise and during the live workshop will be given for docker, since that is the most common application.
As a general rule, editing of files will be done in the standard terminal (the containers do not have all text editors!) or via the jupyter-lab interface, and then commands will be executed inside the container shell. If you see `Singularity>` on your command line, you are ready to run a ROOT or python script.

:::::::::::::

## Your first C/C++ program (Optional Review!)

Let's start with writing a simple `hello world` program in C. First we'll edit the
*source* code with an editor of your choice.

Note that you will
*edit* the file in a local terminal on your computer and then *run* the file
in the Docker environment. This is because we mounted the `cms_open_data_root` directory
on your local disk such that it is visible inside the Docker container.

Let's create a new file called `hello_world.cc` in the `cms_open_data_root` directory, using your preferred editor.
*source* code with an editor of your choice.
Go to your [working directory for this exercise](../learners/setup.md), and let's create a
new file called `hello_world.cc`, using your preferred editor.

The first thing we need to do, is `include` some standard libraries. These libraries
allow us to access the C and C++ commands to print to the screen (`stdout` and `stderr`) as
Expand Down Expand Up @@ -163,7 +131,7 @@ Hello world! This uses the C++ 'iostream' library to direct output to standard o
Hello world! This uses the C++ 'iostream' library to direct output to standard error.
```

When you are working with the Open Data, you will be looping over events
When you are working with CMS data, you might be looping over events
and may find yourself making selections based on certain physics criteria.
To that end, you may want to familiarize yourself with the C++ syntax for
[loops](https://www.w3schools.com/cpp/cpp_for_loop.asp)
Expand Down
16 changes: 4 additions & 12 deletions episodes/03-root-and-cpp-read-and-write.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,6 @@ for a separate workshop, but much of the material is relevant for the Open Data
to complete the tutorial.
* [ROOT tutorial from Nevis Lab (Columbia Univ.)](https://www.nevis.columbia.edu/~seligman/root-class/). Very complete and always up-to-date tutorial from our friends at Columbia.

::::::::::::::::::: callout
## Be in the container!
For this episode, you'll still be running your code from the `my_root` docker container
that you launched in the previous episode.

As you edit the files though, you may want to do the editing from your *local* environment,
so that you have access to your preferred editors.
:::::::::::::::::::::

## ROOT terminology

Expand Down Expand Up @@ -434,12 +426,12 @@ of what you just ran.
Huzzah! You've successfully written your first ROOT file!

:::::::::::::::::::: callout
## Will I have to `make` my Open Data analysis code?
## Will I have to `make` my analysis code?

Maybe!

- If you prefer to write your end-level analysis code in C++, your `make` setup will be very similar to this exercise
- If you are analyzing AOD or MiniAOD data and using the dedicated CMS software, a configuration and build system called [SCRAM](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideScram) is used for this purpose of compiling and linking code.
- If you are analyzing AOD or MiniAOD data and using CMSSW software, a configuration and build system called [SCRAM](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideScram) is used for this purpose of compiling and linking code.
- If you only analyze NanoAOD samples and do so in python (see the upcoming lesson pages!), then you will not use `make`
::::::::::::::::::::::

Expand Down Expand Up @@ -639,7 +631,7 @@ clean:
rm -f ./*~ ./*.o ./read_ROOT_file
```

We can now compile and run the code in your `my_root` container shell!
We can now compile and run the code!

```bash
make read_ROOT_file
Expand Down Expand Up @@ -691,7 +683,7 @@ still using the C++ syntax.

:::::::::::::::::::::::::::::: keypoints

- ROOT defines the file format in which all of the CMS Open Data is stored.
- ROOT defines the file format in which all of the CMS data is stored.
- These files can be accessed quickly using C++ code and the relevant information can be dumped out into other formats.

::::::::::::::::::::::::::::::
22 changes: 6 additions & 16 deletions episodes/04-root-and-cpp-fill-a-histogram.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ we have to a new file, `fill_histogram.cc`.
cp read_ROOT_file.cc fill_histogram.cc
```

Into this file, we'll add some lines at some key spots. Again, use your favourite editor on your local computer. For now, we'll go through those lines
Into this file, we'll add some lines at some key spots. For now, we'll go through those lines
of code individually, and then show you the completed file at the end to see where they went.

First we need to include the header file for the ROOT [TH1F](https://root.cern.ch/doc/master/classTH1F.html) class.
Expand Down Expand Up @@ -193,7 +193,7 @@ clean:
rm -f ./*~ ./*.o ./fill_histogram
```

And then compile and run it, remember to do it in the container!
And then compile and run it:

```bash
make fill_histogram
Expand All @@ -203,14 +203,6 @@ make fill_histogram
The output on the screen should not look different. However, if you list the contents of the directory,
you'll see a new file, `output.root`!

If you are using a container with VNC, now it is time to start the graphics window with

```bash
start_vnc
```

and connect to it with the default password `cms.cern`.

To inspect this new ROOT file, we'll launch CINT for the first time and create a
[`TBrowser` object](https://root.cern.ch/doc/master/classTBrowser.html).

Expand All @@ -234,7 +226,8 @@ to inspect ROOT files. Inside this CINT environment, type the following
root [1] TBrowser b;
```

You should see the `TBrowser` pop up!
You should see the `TBrowser` pop up! If no browser pops up, check out the [uscms.org page about the LPC](https://uscms.org/uscms_at_work/computing/getstarted/uaf.shtml)
to troubleshoot your X11 connection, or ask for help on Mattermost.

:::::::::::::::::::: callout

Expand Down Expand Up @@ -269,16 +262,15 @@ Open the `tree.root` file with ROOT:
root -l tree.root
```

Now, dump the content of the `t1` tree with the method `Print`. Note that, by opening the file, the ROOT tree in there is automatically loaded.
You can dump the content of the `t1` tree with the method `Print`. Note that, by opening the file, the ROOT tree in there is automatically loaded.

```cpp
root [0]
Attaching file tree.root as _file0...
root [1] t1->Print()
```

Please copy the output this statement generates and paste it into the corresponding section in our [assignment form](https://docs.google.com/forms/d/e/1FAIpQLSdxsc-aIWqUyFA0qTsnbfQrA6wROtAxC5Id4sxH08STTl8e5w/viewform); remember you must sign in and <strong style="color: red;">click on the submit button</strong> in order to save your work. You can go back to edit the form at any time.
Then, quit ROOT.
When you're done, quit ROOT.

::::::::::::::::::::::::::::::::

Expand Down Expand Up @@ -418,8 +410,6 @@ You'll be popped into the CINT environment and you should see the following plot

::::::::::::

Exit from the container. If you are using a container with VNC, first stop VNC with `stop_vnc`.

::::::::::::::::: keypoints

- You can quickly inspect your data using just ROOT
Expand Down
43 changes: 0 additions & 43 deletions episodes/05-using-root-with-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,53 +40,10 @@ data processing in python, and the scikit-HEP tools are very important for that

You can check out a tutorial for many of their tools [here](https://hsf-training.github.io/hsf-training-scikit-hep-webpage/).

Check warning on line 41 in episodes/05-using-root-with-python.md

View workflow job for this annotation

GitHub Actions / Build Full Site

[uninformative link text]: [here](https://hsf-training.github.io/hsf-training-scikit-hep-webpage/)

Check warning on line 41 in episodes/05-using-root-with-python.md

View workflow job for this annotation

GitHub Actions / Build Full Site

[uninformative link text]: [here](https://hsf-training.github.io/hsf-training-scikit-hep-webpage/)

## Using the Python docker container

The tools in the Python docker container will allow you to can easily open
and analyze ROOT files. This is useful for when you make use of the CMS open data tools to skim
some subset of the open data and then copy it to your local laptop, desktop, or perhaps an
HPC cluster at your home institution.

If you completed the [Docker pre-exercises](https://cms-opendata-workshop.github.io/workshopwhepp-lesson-docker/)
you should already have worked through
[this episode](https://cms-opendata-workshop.github.io/workshopwhepp-lesson-docker/03-docker-for-cms-opendata/index.html), under **Download the docker images for ROOT and python tools and start container**, and you will have

- a working directory `cms_open_data_python` on your local computer
- a docker container with name `my_python` created with the working directory `cms_open_data_python` mounted into the `/code` directory of the container.

Start your python container with

```bash
docker start -i my_python
```

In the container, you will be in the `/code` directory and it shares the files with your local `cms_open_data_python` directory.

:::::::::::: callout

## If you're using apptainer:

Whenever you see a `docker start` instruction, replace it with `apptainer shell` to open either the ROOT or Python container image.
The specific commands in this pre-exercise and during the live workshop will be given for docker, since that is the most common application.
As a general rule, editing of files will be done in the standard terminal (the containers do not have all text editors!) or via the jupyter-lab interface, and then commands will be executed inside the container shell. If you see `Singularity>` on your command line, you are ready to run a ROOT or python script.

:::::::::::::

If you want to test out the installation, from within Docker you can launch and
interactive python session by typing `python` (in Docker) and then trying

```python
import uproot
import awkward as ak
```

If you don't get any errors then congratulations! You have a working environment and you are ready to
perform some HEP analysis with your new python environment!

:::::::::::::: keypoints

- PyROOT is a complete interface to the ROOT libraries
- Scikit-HEP provides tools to interface between ROOT and global scientific python tools
- We will use `uproot`, `awkward`, and `vector` in our NanoAOD analysis

::::::::::::::
42 changes: 10 additions & 32 deletions episodes/06-uproot.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,56 +32,35 @@ from Jim Pivarski.

## How to type these commands?

Now that you've installed the necessary python modules in your `my_python` container you can choose to write and execute the code however
you like. There are a number of options, but we will point out two here.

* [Jupyter notebook](https://jupyter.org/). This provides an editor and an evironment in which to run
your python code. Often you will run the code one *cell* at a time, but you could always put all your
code in one cell if you prefer. There are many, many tutorials out there on using Jupyter notebooks
and if you chose to use Jupyter as your editing/executing environment that you have developed some
familiarity with it.

* In the `my_python` container, you can start `jupyter-lab` with
```bash
jupyter-lab --ip=0.0.0.0 --no-browser
```
and open the link given in the message on your browser. Choose the icon under "Notebook".

* Python scripts. In this approach, you edit the equivalent of a text file and then pass that text
file into a python interpreter. For example, if you edited a file called `hello_world.py` such that
it contained
There are many options for interacting with python scripts for CMS data analysis, including interactive tools like jupyter notebooks.
In this exercise, we will stick to editing python scripts. For example, if you edited a file called `hello_world.py` such that
it contained:

```python
print("Hello world!")
```

You could save the file and then (perhaps in another Terminal window), execute
You could save the file and then execute:

```bash
python hello_world.py
```

This would interpret your text file as python commands and produce the output
This would interpret your text file as python commands and produce the output:

```output
Hello world!
```
We leave it to you to decide which approach you prefer.

If you would prefer to use a jupyter notebook for these exercises, go to [CERN's SWAN facility](https://swan.web.cern.ch/swan/) and try the new interactive jupyter-lab interface (you can leave the other options to their defaults). From the options page, select a Python3 notebook.

## Open a file

Let's open a ROOT file!
If you're writing a python script, let's call it `open_root_file.py` and if you're using
a Jupyter notebook, let's call it `open_root_file.ipynb`. If you are working in the container, you will open and *edit* the python script on your local computer and *run* it in the container, or you will open a notebook on your jupyter-lab window in the browser.
Call your script `open_root_file.py` and open it in your preferred text editor. On this webpage we will show small snippets of Python that you can add to your script one after the other, and run to see new output.

First we will import the `uproot` library, as well as some other standard
libraries. These can be the first lines of your python script or the first cell of your Jupyter notebook.

*If this is a script, you may want to run `python open_root_file.py` every few lines or so to see the output.
If this is a Jupyter notebook, you will want to put each snippet of code in its own cell and execute
them as you go to see the output.*

libraries. These can be the first lines of your python script:

```python
import numpy as np
Expand Down Expand Up @@ -115,8 +94,7 @@ on the CERN Open Data Portal. If you scroll down to the bottom of the page and c
the **Download** button.

For the remainder of this tutorial you will want the file to be in the same directory/folder
as your python code, whether you are using a Jupyter notebook or a simple python script. So make
sure you move this file to that location after you have downloaded it.
as your python code. So make sure you move this file to that location after you have downloaded it.

To read in the file, you'll change one line to define the input file to be
```python
Expand All @@ -132,7 +110,7 @@ So you've opened the file with `uproot`. What is this `infile` object? Let's add
print(type(infile))
```

and we get
and upon running the script we get

```output
<class 'uproot.reading.ReadOnlyDirectory'>
Expand Down
4 changes: 2 additions & 2 deletions episodes/07-awkward.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ of your file.

## Environment

Use the Python environment that you set up in the previous two lesson pages, such as your `my_python` docker container.
Use the Python environment that you set up in the previous two lesson pages.
We leave it up to you whether or not you write and execute this code in a script or as a Jupyter notebook.

## Numpy arrays: a review
Expand Down Expand Up @@ -152,7 +152,7 @@ or by downloading it.
## *Stop!*

If you haven't already, make sure you have run through the
[previous lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-cpp-root-python/06-uproot/index.html) on working with uproot.
[previous lesson](06-uproot.md) on working with uproot.
::::::::::::::::::::::::

Let's open this ROOT file! If you're writing a python script, let's call it `open_root_file_and_analyze_data.py` and if you're using
Expand Down
2 changes: 1 addition & 1 deletion episodes/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ tutorials, for those who want to go further.
::::::::::::::::::::::::::::: callout
## You still have choices!

Just to emphasize, you really only *need* to use ROOT and C++ at the early stages of analyzing CMS Open Data in the AOD (Run 1) or MiniAOD (Run 2) formats. These datasets require using CMS-provided tools that perform much better in C++ than python. However, downstream in your analysis or to analyze Run 2 NanoAOD files, you are welcome to use whatever tools and file formats you choose.
Just to emphasize, you really only *need* to use ROOT and C++ at the early stages of analyzing CMS Open Data in the AOD (Run 1) or MiniAOD (Run 2 or Run 3) formats. These datasets require using CMS-provided tools that perform much better in C++ than python. However, downstream in your analysis or to analyze NanoAOD files, you are welcome to use whatever tools and file formats you choose.
:::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints
Expand Down
19 changes: 18 additions & 1 deletion learners/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,22 @@
title: Setup
---

This lesson requires a computer with an internet connection and a bash shell (either native Linux, MacOs or Windows WSL2 Linux). You should have Docker installed and the [Docker pre-exercises](https://cms-opendata-workshop.github.io/workshop2023-lesson-docker/) finished so that you can access the containers `my_root` and `my_python` created as instructed there.
This lesson requires several software packages common in high-energy physics. This exercise is intended to be used for Fermilab's CMS Data Analysis School 2025. If you have completed the [pre-exercises](https://fnallpc.github.io/cms-das-pre-exercises/), go to the following CMSSW area and set the CMS environment to access these packages.

Connect to the LPC cluster:

```bash
kinit <YourUsername>@FNAL.GOV # if needed
ssh -XY <your-username>@cmslpc-el8.fnal.gov
```

Access your CMSSW environment:

```bash
cd ~/nobackup/cmsdas/CMSSW_13_0_10
cmsenv
mkdir root-exercise
cd root-exercise/
```

Now you are ready to do the steps of this exercise in your terminal!

0 comments on commit a7592d3

Please sign in to comment.