HTML Reports¶
Generate interactive visual reports of your pipeline.
Basic Usage¶
This creates a standalone HTML file with:
- Pipeline flow diagram
- Retention funnel visualization
- Dropped rows breakdown
- Cell change history
- Interactive filtering
Report Contents¶
Pipeline Overview¶
Shows high-level statistics:
- Total rows processed
- Final row count
- Overall retention rate
- Number of steps
Retention Funnel¶
Visual representation of how rows flow through each step:
Load: ████████████████████ 1000 rows
dropna: ████████████████░░░░ 850 rows (-150)
filter: ████████████░░░░░░░░ 600 rows (-250)
merge: ██████████████████░░ 900 rows (+300)
final: ████████████████░░░░ 847 rows (-53)
Drops by Operation¶
Breakdown of which operations dropped the most rows:
| Operation | Rows Dropped | % of Total |
|---|---|---|
| DataFrame.dropna | 150 | 37% |
| DataFrame.getitem[mask] | 250 | 62% |
| DataFrame.drop_duplicates | 3 | 1% |
Cell Changes¶
For watched columns, shows modification history:
| Column | Changes | Operations |
|---|---|---|
| income | 423 | DataFrame.fillna |
| status | 89 | DataFrame.setitem |
Ghost Values¶
Last known values of dropped rows (debug mode only):
| Row ID | status | Dropped By | |
|---|---|---|---|
| 42 | alice@... | active | dropna |
| 156 | bob@... | inactive | filter |
Options¶
Custom Title¶
Include Raw Data¶
Adds a data preview table to the report. Use with caution for large DataFrames.
Minimal Report¶
Generates a simpler report without charts (faster, smaller file size).
Programmatic Access¶
If you need the report data without HTML:
# Get report data as dict
dbg = tp.debug.inspect()
report_data = dbg.export("dict")
# Contains:
# - steps: list of all operations
# - drops: dropped row details
# - changes: cell modifications
# - stats: summary statistics
Viewing Reports¶
The generated HTML is self-contained (no external dependencies). Open it in any browser:
# macOS
open pipeline_audit.html
# Linux
xdg-open pipeline_audit.html
# Windows
start pipeline_audit.html
Integration with Notebooks¶
In Jupyter notebooks, you can display reports inline:
Or use the display helper:
Example Report¶
import tracepipe as tp
import pandas as pd
tp.enable(mode="debug", watch=["income", "status"])
# Sample pipeline
df = pd.read_csv("customers.csv")
df = df.dropna(subset=["email"])
df["income"] = df["income"].fillna(df["income"].median())
df = df[df["age"] >= 18]
df = df.merge(segments, on="segment_id")
# Generate comprehensive report
tp.report(df, "customer_pipeline_audit.html", title="Customer ETL Pipeline")