**Authors:** Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko **Published:** NeurIPS 2021 **Link:** [GitHub](https://github.com/yandex-research/tabular-dl-revisiting-models) --- ### **What is the Paper About?** This paper provides a **comprehensive comparison** of deep learning (DL) models for **tabular data**, which is common in industry and competitions. It introduces two strong baselines: a **ResNet-like model** and a simplified **FT-Transformer**, both designed to perform well across a wide range of tabular tasks. --- ### **Key Points:** - **Problem:** - Many DL models for tabular data exist, but they’re not tested under **consistent benchmarks**, making comparisons unreliable. - There’s a **lack of simple, effective baselines** for tabular DL. - **Proposed Baselines:** - **ResNet-like Model:** A simplified architecture with skip connections, adapted from computer vision. - **FT-Transformer:** A variant of the Transformer using feature embeddings for tabular input, inspired by success in NLP. - **Findings:** - **ResNet performs better than most existing DL models** and should be used as a new baseline. - **FT-Transformer** outperforms other architectures in most tasks and is more **universal**. - **Gradient Boosted Decision Trees (GBDT)** like XGBoost and [[LightGBM]] **still outperform** deep learning models in many tabular cases. --- ### **How It Works:** 1. **MLP Baseline:** Simple feedforward model with dropout and ReLU. 2. **ResNet Block:** Adds skip connections and batch normalization to handle deeper tabular models. 3. **FT-Transformer:** Embeds each feature and processes them using self-attention layers, leveraging CLS token for prediction. 4. **Evaluation:** Conducted on a large number of real-world tabular datasets under unified training and tuning protocols. --- ### **Strengths:** - Brings **clarity and reproducibility** to comparisons in tabular DL. - Identifies simple but **high-performing architectures** (ResNet, FT-Transformer). - Provides **open-source implementations** and benchmarks. --- ### **Limitations:** - No universally best model...**GBDT still dominates** in many scenarios. - **Transformers require more compute** and tuning than tree-based models. - **Categorical features** need special preprocessing for DL models.