Changelog¶
All notable changes to TracePipe will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.4.2] - 2026-02-04¶
Fixed¶
CheckResultchange tracking: Addedn_changesandchanges_by_opproperties in debug mode to track value changes across pipeline stepsTraceResultstatus fields: Addedstatus,dropped_by, anddropped_at_stepproperties for clearer dropped row analysisDiffResultcompleteness: Addedcells_changed,changes_by_column,rows_unchanged, andchanged_rowsfor detailed snapshot comparison- Ghost value API: Implemented
dbg.get_ghost_values(row_id)for retrieving last known values of dropped rows - Merge provenance:
trace.originandtrace.merge_originnow properly populated for merged rows - Documentation alignment: All documented APIs now match actual implementation with comprehensive test coverage
Changed¶
tp.trace()API enhancement: Addedrow_id=parameter for explicit internal row ID trackingrow=now strictly refers to DataFrame positional indexrow_id=refers to TracePipe's internal row identifier (stable across operations)- Supports tracing dropped rows by ID:
tp.trace(df, row_id=42) tp.why()API enhancement: Addedrow_id=parameter matchingtp.trace()signature
Added¶
- Comprehensive test suite (
test_doc_api_alignment.py) with 27 tests validating documented API features - Better error messages for out-of-bounds row access
[0.4.1] - 2026-02-04¶
Fixed¶
- Fully implemented
CheckResultconvenience properties (.passed,.retention,.n_dropped,.n_steps,.drops_by_op) - Added comprehensive tests for
CheckResultAPI to ensure properties work correctly - Properties now properly access underlying
.factsdictionary for all metrics
Changed¶
- Cleaned up example files and test scripts
[0.4.0] - 2026-02-04¶
Added¶
- Full row provenance for
pd.concat(axis=0): Row IDs are now preserved through concatenation - Each result row maintains its original RID from the source DataFrame
ConcatMappingtracks which source DataFrame each row came from-
Concat steps are now marked
FULLcompleteness -
Duplicate drop provenance in debug mode:
drop_duplicatesnow tracks which row "won" DuplicateDropMappingmaps dropped rows to their kept representative- Supports
keep='first',keep='last', andkeep=False -
Uses
hash_pandas_objectfor fast, NaN-safe key comparison -
Clean
TraceResultAPI for provenance: trace.origin— Unified origin:{"type": "concat", "source_df": 1}or{"type": "merge", ...}trace.representative— For dedup drops:{"kept_rid": 42, "subset": ["key"], "keep": "first"}-
No need to access internal
.storemethods -
Clean
CheckResultAPI: result.passed— Alias for.okresult.retention— Row retention rate (0.0-1.0)result.n_dropped,result.n_steps,result.drops_by_op-
All properties discoverable via autocomplete
-
Comprehensive test suite: 38 new tests covering concat, dedup, and TraceResult API
Changed¶
wrap_concat_with_lineagerewritten for full provenance trackingaxis=1concat propagates RIDs if all inputs match, otherwise PARTIALTraceResultenhanced with.originand.representativeproperties
[0.3.5] - 2026-02-03¶
Fixed¶
- DataFrame.fillna double-logging:
df.fillna({"col": 0})now logs exactly 1 event - Added
wrap_pandas_transform_methodwith_in_transform_opflag
Added¶
- Known Limitations section in README documenting concat/dedup tracking gaps
Changed¶
- Test suite hardened with exact count assertions and multi-scenario tests
[0.3.4] - 2026-02-03¶
Fixed¶
- Event deduplication: Identical events from parallel pipelines are now deduplicated
[0.3.3] - 2026-02-03¶
Fixed¶
- Double-logging bug:
df['col'] = df['col'].fillna()now logs exactly one event - Merge warning scoping:
tp.check(df)now only shows warnings for merges in df's lineage
[0.3.2] - 2026-02-03¶
Fixed¶
- Merge duplicate key warnings now correctly identify which table (left/right) has duplicates
[0.3.1] - 2026-02-03¶
Fixed¶
- Cell history now correctly chains through merge operations via lineage traversal
tp.why()andtp.trace()show pre-merge changes for post-merge rowsenable()resets accumulated state when called multiple times
Added¶
get_row_history_with_lineage()andget_cell_history_with_lineage()methods
[0.3.0] - 2026-02-03¶
Added¶
- MkDocs documentation site with Material theme
- Comprehensive API reference documentation
- Getting started guides and tutorials
tp.register()API for manually registering DataFrames- Configurable retention threshold in
tp.check() - Ghost row capture for fallback filter paths
- Data quality contracts with fluent API
- HTML report generation
- Snapshot and diff functionality
- Debug mode with cell-level tracking
tp.why()for cell provenancetp.trace()for row journey- Support for all major pandas operations
Fixed¶
- Recursion bug when accessing hidden column in COLUMN mode
- Config propagation issues
- Retention rate calculation for multi-table pipelines
- Export wrappers correctly strip hidden column