FTTransformer - Tim Koopmans

**Authors:** Xiang Liu, Luanqi Liu, Feng Tian, Yiman Zhao, Xianyong Dai, Xiaolin Liu, Fujun Zhang **Published:** _2024 International Seminar on Artificial Intelligence, Computer Technology and Control Engineering (ACTCE)_ **Link:** [IEEE Xplore](https://doi.org/10.1109/ACTCE65085.2024.00098) --- ### Summary This paper introduces #FTTransformer, a neural network model designed to predict data center resource utilization. It combines the Transformer model for encoding and a multi-layer #LSTM decoder to enhance prediction accuracy, especially for time series data from data centers. --- ### Key Points - **Problem:** Traditional time series prediction methods (like #ARIMA and #SVM) often fail to capture the **complexity** and **long-distance dependencies** in data center resource usage. - **Solution:** The **FTTransformer** model, which: - Uses **Transformer encoders** to capture long-term dependencies. - Fine-tunes the model with **LSTM decoders** for accurate time series prediction. - Applies **data augmentation** techniques (time slice translation and removal) to improve generalization. - **Performance:** - Tested on the **Alibaba Cluster Tracking Dataset**. - Achieved **2.13% improvement in RMSE** and **2.3% improvement in NSE** compared to other models (LSTM, GRUED, LSTNet, and basic Transformer). - The FTTransformer model shows better consistency between predicted and actual values compared to other deep learning models. --- ### How It Works 1. **Pre-Training:** Uses **supervised learning** to enhance the model’s ability to capture temporal features. 2. **Fine-Tuning:** Further adapts the model with **real-time data**, updating hidden states based on similarity between predictions and real labels. 3. **Data Augmentation:** - **Time Slice Translation:** Moves time steps to simulate shifts. - **Time Slice Removal:** Removes segments to mimic data loss or noise. 4. **Encoder-Decoder Structure:** Combines **Transformer encoding** with **LSTM decoding** to improve accuracy. --- ### Strengths - **High Accuracy:** Outperforms existing methods, especially in long-term resource prediction. - **Efficient Generalization:** Uses data augmentation to adapt to diverse data patterns. - **Adaptability:** Capable of fine-tuning based on new data, making it more robust in dynamic environments. --- ### Limitations - **Complexity:** The model structure is more complicated compared to traditional methods. - **Data Specificity:** Tested mainly on **Alibaba’s data center data**, so generalizability to other data centers may vary. - **Resource Intensive:** The multi-layer LSTM decoder can be computationally expensive.