Please enable JavaScript in your browser.

“TabGLM: Tabular Graph Language Model for Learning Transferable Representations through Multi-Modal Consistency Minimization” accepted at AAAI 2025 - fltech - Technology Blog of Fujitsu Research

fltech - Technology Blog of Fujitsu Research

A technology blog where Fujitsu researchers talk about a variety of topics

“TabGLM: Tabular Graph Language Model for Learning Transferable Representations through Multi-Modal Consistency Minimization” accepted at AAAI 2025

Introduction

Hello! I am Maria Xenochristou, a Senior Researcher at the Artificial Intelligence Laboratory of Fujitsu Research of America. I study deep learning architectures for tabular data, as well as table retrieval and QA.

Recently, our research paper "TabGLM: Tabular Graph Language Model for Learning Transferable Representations through Multi-Modal Consistency Minimization" was accepted at AAAI 2025, one of the top conferences in AI.

  • Title: "TabGLM: Tabular Graph Language Model for Learning Transferable Representations through Multi-Modal Consistency Minimization"
  • Authors: Anay Majee*† (Fujitsu Research of America, University of Texas at Dallas), Maria Xenochristou* (Fujitsu Research of America), Wei-Peng Chen (Fujitsu Research of America)
    *These authors contributed equally.
    †Anay Majee is a PhD student at the University of Texas at Dallas. Work was performed during an internship at Fujitsu Research of America.
  • Conference: Association for the Advancement of Artificial Intelligence (AAAI) 2025

TL;DR: TabGLM is a multi-modal deep learning model that transforms tabular data into graph and text representations, capturing both structural and semantic information. By aligning embeddings through a joint semi-supervised objective, it achieves state-of-the-art performance on heterogeneous tabular datasets while maintaining a lightweight architecture (Fig. 1).

Figure 1: Semi-Supervised Multi-Modal Tabular Deep Learning in TabGLM. We propose a joint graph-language method that can effectively learn from heterogeneous, real-world tabular datasets by integrating structural and semantic information.

Overview

Background

Tabular data forms the backbone of many industries, such as healthcare, finance, and e-commerce. It is one of the most prevalent data structures, representing real-world information in rows and columns, often with a mix of numerical, categorical, and text-based features.

Despite its importance, learning from tabular data comes with unique challenges. Tabular datasets are often small, with diverse feature types, and unlike images or natural language, they lack the inherent inductive biases that aid learning, making it harder for models to generalize effectively.

Thus, existing deep learning approaches often fall short. High parameter counts can result in overfitting on limited data, and methods that transform tabular data into images, text, or graphs allow for advanced techniques but fail to simultaneously capture the semantic richness and structural relationships critical for effective representation and learning.

Proposed Method

To address these challenges, we propose TabGLM (Tabular Graph Language Model), a multi-modal architecture designed to model both structural and semantic information by transforming each row of a tabular dataset into both a fully connected graph and serialized text. By combining these representations, TABGLM learns comprehensive feature embeddings that leverage the strengths of both modalities.

Key components:

  1. Text Transformation: The text pipeline serializes rows into natural text and then encodes them into embeddings, utilizing pretrained text encoders such as TAPAS [1] and TAPEX [2], which are kept frozen during training.
  2. Graph Transformation: The graph pipeline constructs graphs where nodes represent features, and edges capture their relationships. A Graph Neural Network (GNN) encodes there graphs into graph embeddings.
  3. Multi-Modal Consistency Learner (MUCOSA): At the core of TABGLM is MUCOSA, our Multi-Modal Consistency Learner. The primary goals of MUCOSA are to align graph and text embeddings, regularize the model to prevent overfitting, and ensure effective prediction of target labels. MUCOSA achieves this by minimizing two complementary losses.
    • Consistency Loss: Measures how well the graph and text embeddings align in a shared embedding space, in a label-free fashion.
    • Supervised Loss: Measures the discrepancy between the ground truth and the predicted logits from the classifier head. The classifier head consumes only the graph embeddings to mimic the inference setting.

Figure 2: Overview of our TabGLM framework. TabGLM introduces Multi-modal Graph-Language Modeling to enable tabular learning on datasets with heterogeneous data types. Our method leverages graph and language embeddings, consistency regularization, and supervised learning to effectively adapt to diverse real-world downstream tasks.

Training and Inference:

  • During training, the graph encoder is optimized, while the text encoder remains frozen.
  • During inference, the model relies solely on graph embeddings for efficiency.
  • TABGLM uses only 336M parameters, making it more computationally efficient than state-of-the-art models like TabLLM [3].

Results

Performance Comparison

Our evaluation of TabGLM was conducted on 25 benchmark datasets, against both traditional machine learning models and state-of-the-art deep learning approaches. The results are averaged over five seeds to ensure reliability and consistency, and cover both binary and multi-class classification tasks.

Overall, TabGLM delivers an average of 2.7% higher AUC-ROC over state-of-the-art methods, setting a new standard for tabular learning.

  1. Traditional Models: TabGLM outperforms traditional machine learning models, achieving a significant increase in AUC-ROC of 4.77% over LR and 2.51% over RF. Tree-based models still excel on simpler datasets, such as kr-vs-kp and pc3 (Table 1).
  2. Deep Learning Models: TabGLM consistently outperforms tabular DL models like FT-Transformer [4] by 5.56%, TabTransformer [5] by 3.64% and NODE [6] by 1.26% (Table 1).
  3. TabLLM benchmark: Compared to uni-modal architectures (e.g., IGNNet [7] for graphs, TabLLM for text), TabGLM outperforms TabLLM by 1.35% and IGNNet by 7.96% (Table 2).

Table 1: Comparison of performance (AUC-ROC) of existing approaches in tabular Machine Learning against TabGLM. Our proposed method TabGLM achieves significant performance gains across 25 classification datasets. The best performing model is colored in dark blue while the second best is colored in light blue.

Table 2: Comparison of performance (AUCROC) of TABGLM against benchmark datasets in TabLLM (Hegselmann et al. 2023). Results from all methods are averaged over five seeds.

Ablation Studies

In our ablation studies, we show the contributions of different components in TABGLM.

Multi-Modal vs. Uni-Modal Training

First, we compare TabGLM's multi-modal training to its unimodal counterparts. In the graph-only pipeline, we train both the graph encoder and classifier, while in the text-only pipeline, we freeze the text encoder and train only the classifier, to ensure fair comparison with TabGLM, where the text encoder remains frozen during training.

Experiments on three representative datasets—pc3 (numerical), bank (balanced numerical and categorical), and creditg (categorical-heavy)—show that TabGLM’s multi-modal design consistently outperforms its unimodal variants, underscoring the value of modality fusion for learning from heterogeneous tables (Table 3).

Table 3: Ablations on the graph and text components of the proposed TabGLM approach. Results are averaged over five seeds.

Choice of Text Encoder

In the Choice of text encoder for the text transformation, we investigate the impact of different text encoders, including TAPAS [6], TAPEX [7], and TabLLM.

TAPAS, with only 129 million parameters, achieves the best AUC-ROC scores with minimal computational overhead, making it the optimal choice for TabGLM. This makes our approach highly efficient, with an 80% reduction in parameter count compared to state-of-the-art deep learning models, such as TabLLM (Table 4).

Table 4: Ablation on the Choice of LLM architecture for the text transformation module of TabGLM.

Conclusion

TabGLM represents a pivotal advancement in deep learning for tabular data by addressing the inherent heterogeneity of these datasets. It combines fully connected graph representations with serialized text, leveraging a graph neural network and a pretrained text encoder to capture both structural and semantic features. Its joint multi-modal learning objective enhances generalization, while the efficient architecture, with fewer parameters than state-of-the-art models, handles diverse feature types. Evaluations on 25 benchmark datasets demonstrate significant AUC-ROC improvements, showcasing TabGLM’s contribution in advancing multi-modal learning for tabular data across domains.

References

[1] Herzig, J.; Nowak, P. K.; Mu ̈ller, T.; Piccinno, F.; and Eisen- schlos, J. 2020. TaPas: Weakly Supervised Table Parsing via Pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
[2] Liu, Q.; Chen, B.; Guo, J.; Ziyadi, M.; Lin, Z.; Chen, W.; and Lou, J.-G. 2022. TAPEX: Table Pre-training via Learn- ing a Neural SQL Executor. In International Conference on Learning Representations.
[3] Hegselmann, S.; Buendia, A.; Lang, H.; Agrawal, M.; Jiang, X.; and Sontag, D. 2023. Tabllm: Few-shot classification of tabular data with large language models. In International Conference on Artificial Intelligence and Statistics, 5549– 5581. PMLR.
[4] Gorishniy, Y.; Rubachev, I.; Khrulkov, V.; and Babenko, A. 2021. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34: 18932–18943.
[5] Huang, X.; Khetan, A.; Cvitkovic, M.; and Karnin, Z. 2020. TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv:2012.06678.
[6] Popov, S.; Morozov, S.; and Babenko, A. 2019. Neural oblivious decision ensembles for deep learning on tabular data. arXiv preprint arXiv:1909.06312.
[7] Alkhatib, A.; Ennadir, S.; Bostrom, H.; and Vazirgiannis, M. 2024. Interpretable Graph Neural Networks for Tabular Data. In ICLR 2024 Workshop on Data-centric Machine Learning Research (DMLR): Harnessing Momentum for Science.