## Behind the curtains
### Data Management
- **DuckDB + Parquet** for high-speed data retrieval and low-latency storage. Perfect for OHLCV data and feature extraction.
- **ScyllaDB** for distributed, near-instantaneous reading and writing at scale. No waiting around.
### Model Training & Inference
- **[[AutoGluon]] + [[Ray]]** to train and execute models in real-time. 1-minute, 5-minute, and 1-hour bars are processed, with lookbacks for deeper analysis.
- **[[NumPy]] + [[CuPy]]** for blazing-fast computation of features using GPU when it matters.
- Real-time adjustments based on market conditions, with no manual intervention required.
![[model-training.png]]
### Monitoring & Observability
- **Prometheus + Grafana** for real-time metrics. If it moves, it’s monitored.
- **Loki + Promtail** for journal logs across distributed nodes.
- Custom alerts and dashboards to track latency, trade execution, and failure points.
![[monitoring.png]]
### Deployment & Automation
- **Tailscale** for secure, low-latency VPN access to nodes.
- **Systemd + Cron** for scheduled job execution and fault recovery.
- **Notebook queue workers** that poll for execution tasks. Notebooks become workflows.
![[notebooks.png]]
### **Infrastructure**
- Running on multi-core (28 vCPU), high-memory (128GB) local servers ... because cloud costs way too much and Frankensteining labs is way more fun
- **QNAP snapshotting** and **rsync** to NAS for fast, cheap recovery.
![[server-rack.jpeg]]