
Please note that this blog post has been translated using machine translation.
Introduction
Hello, I'm Ichiba from Fujitsu Research, Computing Laboratory. I participated in the international conference SC25, held in St. Louis, USA, from November 16 to 21, 2025. In this article, I will share my report on the conference. Various departments within Fujitsu plan to publish a four-part series of reports on SC25. This article will primarily focus on the paper presentations that I found particularly noteworthy. At SC, various organizations, including companies, universities, and research institutions, hold exhibitions, and Fujitsu is also one of the exhibitors. Details about Fujitsu's exhibition will be covered in separate articles.
This time, my primary objective for attending was to explore the latest trends in HPC. As I attended from Toronto, Canada, where I am currently stationed, the travel was relatively easier compared to coming from Japan. Indeed, the flight from Toronto took approximately 2 hours and 30 minutes, and I believe the venue's proximity compared to previous years played a significant role. Furthermore, having never attended SC before, I decided to take this opportunity to participate.

Overview of SC
SC is the world's largest international conference in the field of High-Performance Computing (HPC), officially known as "International Conference for High Performance Computing, Networking, Storage, and Analysis." In addition to paper presentations and corporate exhibitions, it announces and awards the latest rankings of prominent supercomputers, including TOP500, and the Gordon Bell Prize.
At SC25, there were over 16,500 participants *1, 623 paper submissions (a 34% increase from the previous year), with 136 accepted papers and an acceptance rate of 22% *2. Furthermore, the number of submissions set a new record for the second consecutive year, with submissions related to HPC for Machine Learning specifically doubling.

Noteworthy Paper Presentations
While I haven't reviewed all the papers, I'd like to introduce a few that personally caught my interest. I focused on compiler technologies for AI computation, and optimization methods utilizing low-precision operations and Sparse Tensor Cores.
High-Performance and Power-Efficient Emulation of Matrix Multiplication using INT8 Matrix Engines
This paper proposes a method for performing matrix multiplications such as DGEMM and SGEMM using low-precision INT8. It achieves up to 1.4 times speedup for DGEMM and 3.0 times for SGEMM compared to native implementations using cuBLAS. While I knew that low-precision operations are important for AI computation, the insight that using low-precision arithmetic hardware can accelerate higher-precision operations was surprising to me. This paper was presented at ScalAH' 25 (16th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems), and it left the strongest impression on me during my participation this time.
PerfDojo: Automated ML Library Generation for Heterogeneous Architectures
Optimizing Machine Learning libraries according to hardware characteristics is necessary, but manual optimization is becoming increasingly difficult. This paper proposes a method that combines Reinforcement Learning (RL) and Large Language Models (LLM) to achieve program representation and optimization. A key feature is that, during optimization, it does not use specific hardware information but rather uses the results of execution on the hardware as a reward for learning. This asserts that prior hardware knowledge is not required for optimization. It also includes many other interesting proposals.
A Sample-Free Compilation Framework for Efficient Dynamic Tensor Computation
This paper proposes a framework that selects and executes kernels at runtime to handle cases where the shape of input data (tensor dimensions and sizes) changes during execution. Traditionally, kernels are compiled in advance, assuming specific sample inputs, but performance can degrade with unexpected inputs. In contrast, this paper proposes a sample-free framework that does not depend on specific samples. It achieves this by combining multiple approaches. The problem addressed by this paper was something I had wondered how to handle, so it piqued my interest.
Bridging the Gap Between Unstructured SpMM and Structured Sparse Tensor Cores
NVIDIA's Sparse Tensor Cores implement a mechanism to accelerate computations that use only two values out of four consecutive elements (this is called 2:4 structured sparsity). This paper proposes using this mechanism to accelerate SpMM (Sparse-dense Matrix Multiplication). I have been interested in the acceleration of Sparse Tensor Cores since they appeared, as various methods can be considered.
SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation
This paper also proposes accelerating computations using Sparse Tensor Cores, but it differs in that it targets stencil computations. In current HPC, GPUs are highly important, and the fact that two papers utilizing GPU features were accepted further reinforced their significance.
Conclusion
This was my first time participating in SC, and I was surprised by the large number of attendees. Compared to other international conferences I have attended, the scale of the exhibition was much larger, and at the same time, the numerous Japanese exhibitors highlighted the high level of interest in HPC.
This time, I primarily attended the paper presentations at SC25 to explore the technical trends in High-Performance Computing (HPC), especially concerning AI computations. Previously, I had been interested in how to achieve compiler optimization when input data changes at runtime, and optimization methods utilizing Sparse Tensor Cores. At SC25, research presentations on these very topics were made, and it became clear to me that these are indeed research themes many scholars are addressing. Furthermore, the acceleration achieved using low-precision arithmetic hardware like INT8 was particularly surprising. As AI computation is evolving very rapidly with numerous presentations, I felt it was crucial to participate in an international conference like this to directly engage with a wide range of research and grasp the current trends. I intend to leverage the insights gained from this experience in my future research and development.
*1:From the SC25 homepage https://sc25.supercomputing.org
*2:From the Preface of SC25 Proceedings