Cutflow plotting#

This directory contains scripts and utilities for creating cutflow plots from PocketCoffea processor outputs.

Overview#

The scripts analyze the cutflow and sumw dictionaries from PocketCoffea .coffea files to create:

  • Cutflow plots: Absolute number of events for each category (with ratio plots showing ratio to initial category)

  • Sum of weights plots: Weighted number of events for each category

The scripts automatically:

  • Aggregate datasets belonging to the same sample using the datasets_metadata information

  • Handle subsamples correctly

  • Create separate plots for each sample (with optional year separation)

  • Support data and MC samples

  • Use CMS style formatting with proper colors (CMS_blue for cutflow, CMS_orange for sum of weights)

  • Apply smart y-axis limits and scientific notation formatting

  • Generate both regular and ratio plots for cutflow data

Files#

Core Utilities#

  • pocket_coffea/utils/cutflow_utils.py: Core plotting functions and utilities that can be imported

  • pocket_coffea/scripts/plot/plot_cutflow.py: Standalone command-line script for cutflow plotting

Basic Usage#

Command-Line Script (CLI Integration)#

The cutflow plotting is integrated into the PocketCoffea CLI. After installing the package, you can use:

# Basic usage with the new CLI command
plot-cutflow -i output_all.coffea -o cutflow_plots

# All options available
plot-cutflow \
    --input-file output_all.coffea \
    --output-dir cutflow_plots \
    --exclude-categories initial skim \
    --only-samples TTToSemiLeptonic DYJetsToLL \
    --log-y \
    --figsize 12,8 \
    --output-format pdf \
    --summary-only

Standalone Python Script#

You can also run the script directly:

# Use the full-featured script
python pocket_coffea/scripts/plot/plot_cutflow.py -i output_all.coffea -o cutflow_plots

# All options
python pocket_coffea/scripts/plot/plot_cutflow.py \
    --input-file output_all.coffea \
    --output-dir cutflow_plots \
    --exclude-categories initial skim \
    --only-samples TTToSemiLeptonic DYJetsToLL \
    --log-y \
    --figsize 12,8 \
    --output-format pdf

Using as a Python Module#

 1from coffea.util import load
 2from pocket_coffea.utils.cutflow_utils import plot_cutflow_from_output, print_cutflow_summary
 3
 4# Load your coffea file
 5output = load('output_all.coffea')
 6
 7# Print summary
 8print_cutflow_summary(output)
 9
10# Create plots with all enhanced features
11saved_files = plot_cutflow_from_output(
12    output, 
13    output_dir='cutflow_plots',
14    exclude_categories=['initial', 'skim'],
15    only_samples=['TTToSemiLeptonic', 'DYJetsToLL'],
16    figsize=(12, 8),
17    log_y=True,
18    output_format='pdf'
19)
20
21print(f"Created {len(saved_files['cutflow'])} cutflow plots (includes ratio versions)")
22print(f"Created {len(saved_files['sumw'])} sum of weights plots")
23
24# The cutflow plots automatically include both regular and ratio versions
25# Ratio plots show the ratio of each category to the initial category

Data Structure#

The scripts expect PocketCoffea output with the following structure:

 1output = {
 2    'cutflow': {
 3        'initial': {'dataset1': 1000, 'dataset2': 2000, ...},
 4        'skim': {'dataset1': 800, 'dataset2': 1600, ...},
 5        'presel': {'dataset1': 600, 'dataset2': 1200, ...},
 6        'category1': {
 7            'dataset1': {'sample1': 400, 'sample1__subsample': 100},
 8            'dataset2': {'sample2': 800},
 9            ...
10        },
11        ...
12    },
13    'sumw': {
14        # Same structure as cutflow but with weighted events
15        ...
16    },
17    'datasets_metadata': {
18        'by_dataset': {
19            'dataset1': {'sample': 'sample1', 'year': '2018', ...},
20            'dataset2': {'sample': 'sample2', 'year': '2018', ...},
21            ...
22        }
23    }
24}

Technical Details#

Color Scheme and Styling#

The plotting utilities use the official CMS color palette:

  • Cutflow plots: CMS_blue (#3f90da)

  • Sum of weights plots: CMS_orange (#ffa90e)

Plot Features#

  • Two-panel layout: Cutflow plots automatically include ratio panels showing efficiency relative to initial category

  • Scientific notation: Smart formatting for large numbers with LaTeX rendering

  • Year separation: Automatic handling of multi-year datasets with separate plots per year

  • Font standardization: Consistent 12pt font size for all labels and annotations

  • CMS labeling: Proper CMS preliminary labels and luminosity information

Data Processing#

  • Sample aggregation: Automatic grouping of datasets by sample name using datasets_metadata

  • Subsample handling: Proper treatment of subsamples (e.g., sample__subsample naming)

  • Year extraction: Automatic year detection from metadata for proper luminosity labeling

  • Error handling: Robust processing with informative error messages and warnings

Output#

The scripts create:

Individual Sample Plots#

  • cutflow_<sample_name>_<year>.png: Bar plot showing event counts for each category

  • cutflow_with_ratio_<sample_name>_<year>.png: Two-panel plot with cutflow (top) and ratio to initial category (bottom)

  • sum_of_weights_<sample_name>_<year>.png: Bar plot showing weighted event counts for each category

Options#

Common Options#

  • --exclude-categories: Skip certain categories (e.g., ‘initial’, ‘skim’)

  • --only-samples: Only create plots for specified samples

  • --log-y: Use logarithmic y-axis scale

  • --figsize: Figure size as ‘width,height’ (default: ‘10,6’)

  • --output-format: Output format (png, pdf, svg, etc.)

  • --summary-only: Only print summary information without creating plots

Examples#

Typical Workflow#

  1. Check what’s in your file:

    plot-cutflow -i output_all.coffea --summary-only
    
  2. Create basic plots with CLI:

    plot-cutflow -i output_all.coffea -o cutflow_plots
    
  3. Refine with specific samples and options:

    plot-cutflow -i output_all.coffea -o cutflow_plots \
        --only-samples TTToSemiLeptonic DYJetsToLL DATA_SingleMuon \
        --exclude-categories initial skim \
        --log-y \
        --figsize 12,8 \
        --output-format pdf
    

Advanced Usage Examples#

 1# Using the enhanced utilities for custom workflows
 2from pocket_coffea.utils.cutflow_utils import (
 3    plot_cutflow_from_output, 
 4    aggregate_by_sample, 
 5    plot_sample_cutflow
 6)
 7
 8# Load and process data
 9output = load('output_all.coffea')
10
11# Get aggregated data with year separation
12cutflow_by_sample = aggregate_by_sample(
13    output['cutflow'], 
14    categories=['presel', 'category1', 'category2'],
15    datasets_metadata=output['datasets_metadata']['by_dataset'],
16    separate_years=True  # Creates separate entries for each year
17)
18
19# Create custom plots with ratio panels
20for year, samples in cutflow_by_sample.items():
21    for sample, data in samples.items():
22        filepaths = plot_sample_cutflow(
23            sample=sample,
24            sample_data=data,
25            year=year,
26            categories=['presel', 'category1', 'category2'],
27            plot_type='Cutflow',
28            ylabel='Number of Events',
29            output_dir='custom_plots',
30            figsize=(12, 8),
31            log_y=True,
32            output_format='pdf',
33            with_ratio=True  # Include ratio panel
34        )