I used to have notebooks that depended on the output of other notebooks. Looked clean. Was a nightmare.
If notebook A failed silently, notebook B would train on partial data. And I wouldn’t know until something broke in production.
Now each notebook is atomic. Everything reads from DuckDB. If there’s a dependency, it’s a contract via file. No hidden globals. No chained runs.
The cost: a bit more I/O. The gain: sanity.
[[ML]] [[Serendipity]]