**Authors:** Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko
**Published:** NeurIPS 2021
**Link:** [GitHub](https://github.com/yandex-research/tabular-dl-revisiting-models)
---
### **What is the Paper About?**
This paper provides a **comprehensive comparison** of deep learning (DL) models for **tabular data**, which is common in industry and competitions. It introduces two strong baselines: a **ResNet-like model** and a simplified **FT-Transformer**, both designed to perform well across a wide range of tabular tasks.
---
### **Key Points:**
- **Problem:**
- Many DL models for tabular data exist, but they’re not tested under **consistent benchmarks**, making comparisons unreliable.
- There’s a **lack of simple, effective baselines** for tabular DL.
- **Proposed Baselines:**
- **ResNet-like Model:** A simplified architecture with skip connections, adapted from computer vision.
- **FT-Transformer:** A variant of the Transformer using feature embeddings for tabular input, inspired by success in NLP.
- **Findings:**
- **ResNet performs better than most existing DL models** and should be used as a new baseline.
- **FT-Transformer** outperforms other architectures in most tasks and is more **universal**.
- **Gradient Boosted Decision Trees (GBDT)** like XGBoost and [[LightGBM]] **still outperform** deep learning models in many tabular cases.
---
### **How It Works:**
1. **MLP Baseline:** Simple feedforward model with dropout and ReLU.
2. **ResNet Block:** Adds skip connections and batch normalization to handle deeper tabular models.
3. **FT-Transformer:** Embeds each feature and processes them using self-attention layers, leveraging CLS token for prediction.
4. **Evaluation:** Conducted on a large number of real-world tabular datasets under unified training and tuning protocols.
---
### **Strengths:**
- Brings **clarity and reproducibility** to comparisons in tabular DL.
- Identifies simple but **high-performing architectures** (ResNet, FT-Transformer).
- Provides **open-source implementations** and benchmarks.
---
### **Limitations:**
- No universally best model...**GBDT still dominates** in many scenarios.
- **Transformers require more compute** and tuning than tree-based models.
- **Categorical features** need special preprocessing for DL models.