I’m never going back to pandas-only data pipelines. All our bar data (1m, 5m, 1h) is stored in partitioned Parquet files. DuckDB gives me SQL access over them... filtering, joins, and feature pre-selection. Then Ray reads the filtered data in parallel and applies windowed feature extraction. It’s modular, memory-safe, and bloody fast. Feels like I finally have a modern data pipeline that doesn’t fall over every time I ask it to do something real. [[Ray]] [[Serendipity]]