**Authors:** Xiang Liu, Luanqi Liu, Feng Tian, Yiman Zhao, Xianyong Dai, Xiaolin Liu, Fujun Zhang
**Published:** _2024 International Seminar on Artificial Intelligence, Computer Technology and Control Engineering (ACTCE)_
**Link:** [IEEE Xplore](https://doi.org/10.1109/ACTCE65085.2024.00098)
---
### Summary
This paper introduces #FTTransformer, a neural network model designed to predict data center resource utilization. It combines the Transformer model for encoding and a multi-layer #LSTM decoder to enhance prediction accuracy, especially for time series data from data centers.
---
### Key Points
- **Problem:** Traditional time series prediction methods (like #ARIMA and #SVM) often fail to capture the **complexity** and **long-distance dependencies** in data center resource usage.
- **Solution:** The **FTTransformer** model, which:
- Uses **Transformer encoders** to capture long-term dependencies.
- Fine-tunes the model with **LSTM decoders** for accurate time series prediction.
- Applies **data augmentation** techniques (time slice translation and removal) to improve generalization.
- **Performance:**
- Tested on the **Alibaba Cluster Tracking Dataset**.
- Achieved **2.13% improvement in RMSE** and **2.3% improvement in NSE** compared to other models (LSTM, GRUED, LSTNet, and basic Transformer).
- The FTTransformer model shows better consistency between predicted and actual values compared to other deep learning models.
---
### How It Works
1. **Pre-Training:** Uses **supervised learning** to enhance the model’s ability to capture temporal features.
2. **Fine-Tuning:** Further adapts the model with **real-time data**, updating hidden states based on similarity between predictions and real labels.
3. **Data Augmentation:**
- **Time Slice Translation:** Moves time steps to simulate shifts.
- **Time Slice Removal:** Removes segments to mimic data loss or noise.
4. **Encoder-Decoder Structure:** Combines **Transformer encoding** with **LSTM decoding** to improve accuracy.
---
### Strengths
- **High Accuracy:** Outperforms existing methods, especially in long-term resource prediction.
- **Efficient Generalization:** Uses data augmentation to adapt to diverse data patterns.
- **Adaptability:** Capable of fine-tuning based on new data, making it more robust in dynamic environments.
---
### Limitations
- **Complexity:** The model structure is more complicated compared to traditional methods.
- **Data Specificity:** Tested mainly on **Alibaba’s data center data**, so generalizability to other data centers may vary.
- **Resource Intensive:** The multi-layer LSTM decoder can be computationally expensive.