## Behind the curtains ### Data Management - **DuckDB + Parquet** for high-speed data retrieval and low-latency storage. Perfect for OHLCV data and feature extraction. - **ScyllaDB** for distributed, near-instantaneous reading and writing at scale. No waiting around. ### Model Training & Inference - **[[AutoGluon]] + [[Ray]]** to train and execute models in real-time. 1-minute, 5-minute, and 1-hour bars are processed, with lookbacks for deeper analysis. - **[[NumPy]] + [[CuPy]]** for blazing-fast computation of features using GPU when it matters. - Real-time adjustments based on market conditions, with no manual intervention required. ![[model-training.png]] ### Monitoring & Observability - **Prometheus + Grafana** for real-time metrics. If it moves, it’s monitored. - **Loki + Promtail** for journal logs across distributed nodes. - Custom alerts and dashboards to track latency, trade execution, and failure points. ![[monitoring.png]] ### Deployment & Automation - **Tailscale** for secure, low-latency VPN access to nodes. - **Systemd + Cron** for scheduled job execution and fault recovery. - **Notebook queue workers** that poll for execution tasks. Notebooks become workflows. ![[notebooks.png]] ### **Infrastructure** - Running on multi-core (28 vCPU), high-memory (128GB) local servers ... because cloud costs way too much and Frankensteining labs is way more fun - **QNAP snapshotting** and **rsync** to NAS for fast, cheap recovery. ![[server-rack.jpeg]]