Skip to content

HTML Reports

Generate interactive visual reports of your pipeline.

Basic Usage

tp.report(df, "pipeline_audit.html")

This creates a standalone HTML file with:

  • Pipeline flow diagram
  • Retention funnel visualization
  • Dropped rows breakdown
  • Cell change history
  • Interactive filtering

Report Contents

Pipeline Overview

Shows high-level statistics:

  • Total rows processed
  • Final row count
  • Overall retention rate
  • Number of steps

Retention Funnel

Visual representation of how rows flow through each step:

Load:     ████████████████████ 1000 rows
dropna:   ████████████████░░░░  850 rows (-150)
filter:   ████████████░░░░░░░░  600 rows (-250)
merge:    ██████████████████░░  900 rows (+300)
final:    ████████████████░░░░  847 rows (-53)

Drops by Operation

Breakdown of which operations dropped the most rows:

Operation Rows Dropped % of Total
DataFrame.dropna 150 37%
DataFrame.getitem[mask] 250 62%
DataFrame.drop_duplicates 3 1%

Cell Changes

For watched columns, shows modification history:

Column Changes Operations
income 423 DataFrame.fillna
status 89 DataFrame.setitem

Ghost Values

Last known values of dropped rows (debug mode only):

Row ID email status Dropped By
42 alice@... active dropna
156 bob@... inactive filter

Options

Custom Title

tp.report(df, "audit.html", title="Customer Pipeline - Q4 2024")

Include Raw Data

tp.report(df, "audit.html", include_data=True)

Adds a data preview table to the report. Use with caution for large DataFrames.

Minimal Report

tp.report(df, "audit.html", minimal=True)

Generates a simpler report without charts (faster, smaller file size).

Programmatic Access

If you need the report data without HTML:

# Get report data as dict
dbg = tp.debug.inspect()
report_data = dbg.export("dict")

# Contains:
# - steps: list of all operations
# - drops: dropped row details
# - changes: cell modifications
# - stats: summary statistics

Viewing Reports

The generated HTML is self-contained (no external dependencies). Open it in any browser:

# macOS
open pipeline_audit.html

# Linux
xdg-open pipeline_audit.html

# Windows
start pipeline_audit.html

Integration with Notebooks

In Jupyter notebooks, you can display reports inline:

from IPython.display import HTML

tp.report(df, "audit.html")
HTML("audit.html")

Or use the display helper:

# If available
tp.display_report(df)  # Renders inline in notebook

Example Report

import tracepipe as tp
import pandas as pd

tp.enable(mode="debug", watch=["income", "status"])

# Sample pipeline
df = pd.read_csv("customers.csv")
df = df.dropna(subset=["email"])
df["income"] = df["income"].fillna(df["income"].median())
df = df[df["age"] >= 18]
df = df.merge(segments, on="segment_id")

# Generate comprehensive report
tp.report(df, "customer_pipeline_audit.html", title="Customer ETL Pipeline")