Debug API¶

Advanced debugging and inspection tools.

Accessing Debug Tools¶

inspect¶

tp.debug.inspect() -> DebugInspector

Get a debug inspector for raw lineage access.

Returns: DebugInspector

Example:

dbg = tp.debug.inspect()
print(f"Steps recorded: {len(dbg.steps)}")

DebugInspector Properties¶

steps¶

dbg.steps -> list[Step]

All recorded pipeline steps.

Each Step contains:

Attribute	Type	Description
`.operation`	`str`	Operation name
`.input_shape`	`tuple`	Input (rows, cols)
`.output_shape`	`tuple`	Output (rows, cols)
`.timestamp`	`datetime`	When it occurred
`.stage`	`str`	Pipeline stage (if set)

Example:

for step in dbg.steps:
    print(f"{step.operation}: {step.input_shape} → {step.output_shape}")

DebugInspector Methods¶

dropped_rows¶

dbg.dropped_rows() -> set[int]

Get IDs of all dropped rows.

Example:

dropped = dbg.dropped_rows()
print(f"Total dropped: {len(dropped)}")

explain_row¶

dbg.explain_row(row_id: int) -> RowExplanation

Get detailed explanation for a specific row.

Parameters:

Parameter	Type	Description
`row_id`	`int`	Internal row ID

Returns: RowExplanation

Example:

for rid in list(dbg.dropped_rows())[:5]:
    explanation = dbg.explain_row(rid)
    print(f"Row {rid}: {explanation.status}")

explain_group¶

dbg.explain_group(group_key: Any) -> GroupExplanation

Explain which rows belonged to a group (after groupby).

Example:

# After: df.groupby("category").sum()
explanation = dbg.explain_group("Electronics")
print(f"Group 'Electronics' had {len(explanation.member_ids)} rows")

get_ghost_values¶

dbg.get_ghost_values(row_id: int) -> dict[str, Any] | None

Get last known values of a dropped row.

Example:

dropped_rid = list(dbg.dropped_rows())[0]
ghost = dbg.get_ghost_values(dropped_rid)
if ghost:
    print(f"Last values: {ghost}")

stats¶

dbg.stats() -> dict

Get tracking statistics.

Returns:

{
    "steps_recorded": 15,
    "rows_tracked": 1000,
    "rows_dropped": 153,
    "cells_tracked": 5000,
    "memory_bytes": 102400,  # If psutil available
}

export¶

dbg.export(format: str, path: str | None = None) -> dict | None

Export lineage data.

Parameters:

Parameter	Type	Description
`format`	`str`	`"json"`, `"dict"`, or `"csv"`
`path`	`str`	Output file (optional)

Returns: Data dict if path is None, else None.

Example:

# Export to file
dbg.export("json", "lineage.json")

# Get as dict
data = dbg.export("dict")

Complete Example¶

import tracepipe as tp

tp.enable(mode="debug", watch=["price", "status"])

# Run pipeline
df = pd.read_csv("data.csv")
df = df.dropna()
df["price"] = df["price"] * 1.1
df = df[df["price"] > 10]

# Deep inspection
dbg = tp.debug.inspect()

# Review all steps
print("Pipeline steps:")
for i, step in enumerate(dbg.steps):
    print(f"  {i+1}. {step.operation}")
    print(f"     {step.input_shape} → {step.output_shape}")

# Investigate dropped rows
dropped = dbg.dropped_rows()
print(f"\nDropped {len(dropped)} rows")

# Look at specific dropped rows
for rid in list(dropped)[:3]:
    ghost = dbg.get_ghost_values(rid)
    if ghost:
        print(f"  Row {rid}: price was {ghost.get('price')}")

# Export for external analysis
dbg.export("json", "pipeline_lineage.json")

# Stats
stats = dbg.stats()
print(f"\nMemory used: {stats.get('memory_bytes', 'N/A')} bytes")