Cutflow plotting#
This directory contains scripts and utilities for creating cutflow plots from PocketCoffea processor outputs.
Overview#
The scripts analyze the cutflow and sumw dictionaries from PocketCoffea .coffea files to create:
Cutflow plots: Absolute number of events for each category (with ratio plots showing ratio to initial category)
Sum of weights plots: Weighted number of events for each category
The scripts automatically:
Aggregate datasets belonging to the same sample using the
datasets_metadatainformationHandle subsamples correctly
Create separate plots for each sample (with optional year separation)
Support data and MC samples
Use CMS style formatting with proper colors (
CMS_bluefor cutflow,CMS_orangefor sum of weights)Apply smart y-axis limits and scientific notation formatting
Generate both regular and ratio plots for cutflow data
Files#
Core Utilities#
pocket_coffea/utils/cutflow_utils.py: Core plotting functions and utilities that can be importedpocket_coffea/scripts/plot/plot_cutflow.py: Standalone command-line script for cutflow plotting
Basic Usage#
Command-Line Script (CLI Integration)#
The cutflow plotting is integrated into the PocketCoffea CLI. After installing the package, you can use:
# Basic usage with the new CLI command
plot-cutflow -i output_all.coffea -o cutflow_plots
# All options available
plot-cutflow \
--input-file output_all.coffea \
--output-dir cutflow_plots \
--exclude-categories initial skim \
--only-samples TTToSemiLeptonic DYJetsToLL \
--log-y \
--figsize 12,8 \
--output-format pdf \
--summary-only
Standalone Python Script#
You can also run the script directly:
# Use the full-featured script
python pocket_coffea/scripts/plot/plot_cutflow.py -i output_all.coffea -o cutflow_plots
# All options
python pocket_coffea/scripts/plot/plot_cutflow.py \
--input-file output_all.coffea \
--output-dir cutflow_plots \
--exclude-categories initial skim \
--only-samples TTToSemiLeptonic DYJetsToLL \
--log-y \
--figsize 12,8 \
--output-format pdf
Using as a Python Module#
1from coffea.util import load
2from pocket_coffea.utils.cutflow_utils import plot_cutflow_from_output, print_cutflow_summary
3
4# Load your coffea file
5output = load('output_all.coffea')
6
7# Print summary
8print_cutflow_summary(output)
9
10# Create plots with all enhanced features
11saved_files = plot_cutflow_from_output(
12 output,
13 output_dir='cutflow_plots',
14 exclude_categories=['initial', 'skim'],
15 only_samples=['TTToSemiLeptonic', 'DYJetsToLL'],
16 figsize=(12, 8),
17 log_y=True,
18 output_format='pdf'
19)
20
21print(f"Created {len(saved_files['cutflow'])} cutflow plots (includes ratio versions)")
22print(f"Created {len(saved_files['sumw'])} sum of weights plots")
23
24# The cutflow plots automatically include both regular and ratio versions
25# Ratio plots show the ratio of each category to the initial category
Data Structure#
The scripts expect PocketCoffea output with the following structure:
1output = {
2 'cutflow': {
3 'initial': {'dataset1': 1000, 'dataset2': 2000, ...},
4 'skim': {'dataset1': 800, 'dataset2': 1600, ...},
5 'presel': {'dataset1': 600, 'dataset2': 1200, ...},
6 'category1': {
7 'dataset1': {'sample1': 400, 'sample1__subsample': 100},
8 'dataset2': {'sample2': 800},
9 ...
10 },
11 ...
12 },
13 'sumw': {
14 # Same structure as cutflow but with weighted events
15 ...
16 },
17 'datasets_metadata': {
18 'by_dataset': {
19 'dataset1': {'sample': 'sample1', 'year': '2018', ...},
20 'dataset2': {'sample': 'sample2', 'year': '2018', ...},
21 ...
22 }
23 }
24}
Technical Details#
Color Scheme and Styling#
The plotting utilities use the official CMS color palette:
Cutflow plots: CMS_blue (#3f90da)
Sum of weights plots: CMS_orange (#ffa90e)
Plot Features#
Two-panel layout: Cutflow plots automatically include ratio panels showing efficiency relative to initial category
Scientific notation: Smart formatting for large numbers with LaTeX rendering
Year separation: Automatic handling of multi-year datasets with separate plots per year
Font standardization: Consistent 12pt font size for all labels and annotations
CMS labeling: Proper CMS preliminary labels and luminosity information
Data Processing#
Sample aggregation: Automatic grouping of datasets by sample name using
datasets_metadataSubsample handling: Proper treatment of subsamples (e.g.,
sample__subsamplenaming)Year extraction: Automatic year detection from metadata for proper luminosity labeling
Error handling: Robust processing with informative error messages and warnings
Output#
The scripts create:
Individual Sample Plots#
cutflow_<sample_name>_<year>.png: Bar plot showing event counts for each categorycutflow_with_ratio_<sample_name>_<year>.png: Two-panel plot with cutflow (top) and ratio to initial category (bottom)sum_of_weights_<sample_name>_<year>.png: Bar plot showing weighted event counts for each category
Options#
Common Options#
--exclude-categories: Skip certain categories (e.g., ‘initial’, ‘skim’)--only-samples: Only create plots for specified samples--log-y: Use logarithmic y-axis scale--figsize: Figure size as ‘width,height’ (default: ‘10,6’)--output-format: Output format (png, pdf, svg, etc.)--summary-only: Only print summary information without creating plots
Examples#
Typical Workflow#
Check what’s in your file:
plot-cutflow -i output_all.coffea --summary-only
Create basic plots with CLI:
plot-cutflow -i output_all.coffea -o cutflow_plots
Refine with specific samples and options:
plot-cutflow -i output_all.coffea -o cutflow_plots \ --only-samples TTToSemiLeptonic DYJetsToLL DATA_SingleMuon \ --exclude-categories initial skim \ --log-y \ --figsize 12,8 \ --output-format pdf
Advanced Usage Examples#
1# Using the enhanced utilities for custom workflows
2from pocket_coffea.utils.cutflow_utils import (
3 plot_cutflow_from_output,
4 aggregate_by_sample,
5 plot_sample_cutflow
6)
7
8# Load and process data
9output = load('output_all.coffea')
10
11# Get aggregated data with year separation
12cutflow_by_sample = aggregate_by_sample(
13 output['cutflow'],
14 categories=['presel', 'category1', 'category2'],
15 datasets_metadata=output['datasets_metadata']['by_dataset'],
16 separate_years=True # Creates separate entries for each year
17)
18
19# Create custom plots with ratio panels
20for year, samples in cutflow_by_sample.items():
21 for sample, data in samples.items():
22 filepaths = plot_sample_cutflow(
23 sample=sample,
24 sample_data=data,
25 year=year,
26 categories=['presel', 'category1', 'category2'],
27 plot_type='Cutflow',
28 ylabel='Number of Events',
29 output_dir='custom_plots',
30 figsize=(12, 8),
31 log_y=True,
32 output_format='pdf',
33 with_ratio=True # Include ratio panel
34 )