I’m never going back to pandas-only data pipelines.
All our bar data (1m, 5m, 1h) is stored in partitioned Parquet files. DuckDB gives me SQL access over them... filtering, joins, and feature pre-selection. Then Ray reads the filtered data in parallel and applies windowed feature extraction.
It’s modular, memory-safe, and bloody fast.
Feels like I finally have a modern data pipeline that doesn’t fall over every time I ask it to do something real.
[[Ray]] [[Serendipity]]