JHPCN Field Workshop

State-of-the-Art in Code Generative AI for High-Performance Computing

Details

Date: December 5th (Fri), 2025, 10:00 - 17:20
Venue: Lecture Room, 2F, Information Technology Center, Nagoya University
Participation via Zoom is also possible.
Support: Joint Usage/Search Center for Interdisciplinary Large-scale Information Infrastructure (JHPCN)
jh250015: New Horizon Technology of Auto-tuning by Software Engineering
Grant-in-Aid for Scientific Research (B) (24K02945) "ワークフローエンジンとの連携に基づく臨機応変なジョブスケジューリングの実現"
Registration: Participation is free of charge, but please register by filling out the form below by December 4.
https://forms.gle/HM4vvj7ibNj1C6X38
- ~~Please indicate your intention to attend the reception via the form by November 28~~ Closed.
- We will send the URL to Zoom participants by the event date.

Abstract

In recent years, remarkable progress has been made in code generation AI using large language models (LLMs). In particular, their application to HPC codes has been rapidly expanding worldwide. On the other hand, research activities in this field within Japan remain relatively limited. This workshop will feature invited talks introducing research in the United States on the automatic generation of numerical libraries (such as BLAS) and the application of generative AI and auto-tuning to scientific computing codes, thereby contributing to the advancement of interdisciplinary research.

Program (December 3rd) - JHPCN Field Workshop Private Meeting (1)

Attendees: Prof. Daichi Mukunoki (Nagoya University, JAPAN), Dr. Osni Marques (Lawrence Berkeley National Laboratory, USA), Prof. Shuji Morisaki (Nagoya University, JAPAN), Mr. Hiroto Kashimura (Nagoya University, JAPAN)
Venue: Room 402 (4F), Information Technology Center, Nagoya University
Topic: Optimization test cases for STCollection with machine learning.

14:00-14:05	Opening Prof. Daichi Mukunoki (Nagoya University, JAPAN)
14:05-17:30	Discussion: Optimizing software test cases in numerical calculation libraries
18:30-	Reception

Program (December 4th) - JHPCN Field Workshop Private Meeting (2)

Attendees: Prof. Takahiro Katagiri (Nagoya University, JAPAN), Prof. Tetsuya Hoshino (Nagoya University, JAPAN), Prof. Daichi Mukunoki (Nagoya University, JAPAN), Mr. Shun-ichiro Hayashi (Nagoya University, JAPAN), Dr. Osni Marques (Lawrence Berkeley National Laboratory, USA), Dr. Pedro Valero Lara (Oak Ridge National Laboratory, USA)
Venue: Room 402 (4F), Information Technology Center, Nagoya University
Topic: Towards the Adaptation of Generative AI to Numerical Software

10:00-10:05	Opening Prof. Takahiro Katagiri (Nagoya University, JAPAN)
10:05-12:00	Discussion: Current Status of the HPC-GENIE Project
12:00-14:00	Lunch Break
14:00-15:30	Discussion: Current Status at ORNL, CMU, and Berkeley
15:30-16:00	Break
16:00-17:30	Free Discussion
18:30-	Reception

Program (December 5th) - JHPCN Field Workshop

10:00-10:05	Opening Prof. Takahiro Katagiri (Nagoya University, JAPAN) slides
10:05-11:00	Invited Talk (1) JACC (Julia for Accelerators): An environment for Performance-Portable and Heterogeneous High-Performance Computing Dr. Keita Teranishi (Oak Ridge National Laboratory, USA) slides JACC (Julia for Accelerators) is a programming framework that enables scientists to write fast, portable, and efficient code for high-performance computing systems. Built on Julia—a modern language designed for scientific computing—JACC combines high performance, powered by the open-source LLVM compiler, with a simple, easy-to-learn syntax. JACC unifies features from diverse programming tools across hardware platforms and provides a consistent interface for Julia users. With this unified frontend layer, applications can target both CPUs and accelerators such as GPUs (CUDA, HIP, and oneAPI) from a single codebase. For advanced users, JACC also offers tools to fine-tune and optimize performance on cutting-edge accelerator hardware. Two core features highlight JACC’s design: 1. JACC Arrays manage memory and simplify data movement between CPUs and accelerator devices. 2. JACC parallel constructs allow users to express parallel loops (e.g., for and reductions) that run efficiently across CPUs and GPUs. Unlike C++ frameworks such as Kokkos and SYCL, JACC supports runtime performance portability through Julia’s Just-In-Time (JIT) compilation and interactive environment. This reduces development time, simplifies the programming model, and allows the same code to run seamlessly on multiple backends. As a result, JACC improves both productivity and performance portability. In the presentation, we will discuss the latest features and performance of JACC, including results from scientific benchmarks, multi-GPU support, and capabilities for extreme heterogeneous computing.
11:00-11:45	Invited Talk (2) ChatHPC: Building the Foundations for a Productive and Trustworthy AI-Assisted HPC Ecosystem Dr. Pedro Valero Lara (Oak Ridge National Laboratory, USA) slides ChatHPC democratizes large language models for the high-performance computing (HPC) community by providing the infrastructure, ecosystem, and knowledge needed to apply modern generative AI technologies to rapidly create specific capabilities for critical HPC components while using relatively modest computational resources. Our divide-and-conquer approach focuses on creating a collection of reliable, highly specialized, and optimized AI assistants for HPC based on the cost-effective and fast Code Llama fine-tuning processes and expert supervision. We target major components of the HPC software stack, including programming models, runtimes, I/O, tooling, and math libraries. Thanks to AI, ChatHPC provides a more productive HPC ecosystem by boosting important tasks related to portability, parallelization, optimization, scalability, and instrumentation, among others. With relatively small datasets (on the order of KB), the AI assistants, which are created in a few minutes by using one node with two NVIDIA H100 GPUs and the ChatHPC library, can create new capabilities with Meta’s 7-billion parameter Code Llama base model to produce high-quality software with a level of trustworthiness of up to 90% higher than the 1.8-trillion parameter OpenAI ChatGPT-4o model for critical programming tasks in the HPC software stack.
11:45-13:30	Lunch Break
13:30-14:30	Invited Talk (3) AI and HPC Synergies: Developments and Opportunities Dr. Osni Marques (Lawrence Berkeley National Laboratory, USA) slides AI-driven tools have become prominent components across a wide range of CS and E applications, including materials science, drug discovery, vaccine development, network operations, advanced data analysis, cybersecurity, software optimization, and code generation. Today, AI tools are estimated to account for over 40% of all programming activity, in different domains and spanning numerous programming languages. Significant advancements have been made in leveraging AI technologies to streamline HPC, while the synergy between AI and HPC continues to grow. This presentation will discuss ongoing activities and developments at the intersection of AI and HPC, showcasing some current applications and exploring opportunities for future work.
14:30-15:15	Invited Talk (4) SPIRAL: AI for High Performance Code Prof. Franz Franchetti (Carnegie Mellon University, USA) slides This talk provides a current and comprehensive overview of the SPIRAL system, that has been developed over 25 years collaboratively by Carnegie Mellon University, Drexel University, and UIUC. It is now available as BSD Open Source System. We show that SPIRAL is a rule based AI system that captures the knowledge of how algorithms, computer architecture, and program transformations are defined and interact. We develop the underlying formal framework to capture computational algorithms, computing platforms, and program transformations of interest, using a unifying mathematical formalism we call operator language (OL). Then we cast the problem of synthesizing highly optimized computational kernels for a given machine as a strongly constrained optimization problem that is solved by a multi-stage rewriting system. Since all rewrite steps are semantics preserving identity operations, our approach allows us to formally prove the equivalence between the kernel specification and the synthesized program. We present a first look at a SPIRAL-based semantic lifting approach that inverts the rewriting direction and derives the semantics of code by searching for rule application sequences that would lead to a given code fragment. We briefly discuss the X-libraries FFTX, PROTOX, GBTLX, and NTTX implemented as eDSLs (embedded domain specific languages/libraries) in C++ as well as their instantiation as plugins for LLVM/CLANG/FLANG, Python and Julia.
15:15-15:30	Break
15:30-16:00	Talk HPC-GENIE: A Multi-Agent Code Generation Platform Project Based on Context Engineering Prof. Takahiro Katagiri (Nagoya University, JAPAN) slides This presentation introduces the HPC-GENIE project, which develops an HPC code generation AI platform based on context engineering and multi-AI agent collaboration. In HPC-GENIE, we are developing a generative AI framework capable of automatically producing GPU codes such as CUDA from programs written in Fortran, aiming for a dramatic improvement in HPC programming productivity for next-generation supercomputers such as the supercomputer “Fugaku NEXT”. Furthermore, HPC-GENIE envisions the utilization of local LLMs, with the future goal of providing code generation AI services on Japan’s HPCI-based supercomputing infrastructure. In this presentation, we will describe the design philosophy of HPC-GENIE and provide an overview of its prototype system, VibeCodeHPC.
16:00-16:20	Talk Automatic Generation of Numerical Codes for GPUs Using LLMs Prof. Daichi Mukunoki (Nagoya University, JAPAN) slides In recent years, generative AI technologies based on Large Language Models (LLMs) have advanced rapidly and are now widely applied to automatic code generation. In general application development, their effectiveness has already been demonstrated in tasks such as rapid prototyping and code completion. However, in the context of High-Performance Computing (HPC), it is not sufficient for programs to merely “work correctly”; sophisticated optimizations tailored to specific hardware architectures are indispensable. For numerical computation codes in particular, issues of computational accuracy, rounding error accumulation, and numerical stability must also be carefully addressed. These requirements are considerably more complex than those encountered in conventional software generation. This presentation focuses on the automatic generation of GPU-oriented numerical computation codes, examining both the current capabilities of general-purpose LLMs and strategies for leveraging them effectively. Specifically, we analyze the performance of code generated for fundamental linear algebra operations (e.g., Basic Linear Algebra Subprograms, BLAS) and compare results across multiple GPU architectures (NVIDIA, AMD, and Intel) to highlight differences in the generated code. Furthermore, we introduce and discuss techniques aimed at improving code quality. Through these investigations, this talk aims to clarify both the potential and the limitations of LLM-based code generation in numerical computing, and to explore promising directions for future research.
16:20-16:30	Break
16:30-16:45	Talk VibeCodeHPC: CLI-based multi-agents system for auto-tuning Mr. Shun-ichiro Hayashi (Nagoya University, JAPAN) slides In recent years, the technological advancement of Command Line Interfaces (CLIs) with LLMs has been remarkable. The CLI like Claude Code offers LLMs effective tools, such as automation of command line input by shell environment (e.g. bash), several functions for search, read, write, etc. Through the bash tools, LLMs can set up the environment, run and debug code by themselves. It also supports Model Context Protocol (MCP) servers built by many developers which enable LLM to solve domain specific tasks. However, it is still challenging work to make full use of these features for HPC environments. For instance, supercomputers have vendor-specific commands and modules, and job classes for batch queues. Most manuals are limited to open, hence LLMs do not know about their knowledge. There is a problem that OS is too old to install modern Node.js which the CLI requires. To solve the above issues, we propose VibeCodeHPC, which is a multi-agent system for auto-tuning utilizing CLI-based LLM agents like Claude Code. Our framework clarifies how to set up, what to provide, and where to place the information. We carefully designed secure SSH sessions and job management, and defined steps to connect to the login node from user PC. We implemented a dynamic agent spawn system orchestrated by Project Manager (PM) agent. The PM can organize other agents suitable for user's requirement definition. According to preliminary evaluations, several auto-tuning case studies indicate that multi-agent systems adhere to prompts more faithfully than executions using a single agent. Finally, we will share our vision for future work and outline a roadmap for integrating local LLMs into our framework.
16:45-17:10	Talk Evaluating Claude Code’s Coding and Test Automation for GPU Acceleration of a Legacy Fortran Application: A GeoFEM Case Study Prof. Tetsuya Hoshino (Nagoya University, JAPAN) slides With the rise of AI research, supercomputers equipped with GPUs by default have become commonplace, and the use of GPUs in simulation has become even more important. At the same time, porting legacy codes developed for CPUs to GPUs remains a major challenge. A promising way to address this issue is to leverage rapidly advancing code-generation AI; however, because this approach is very new, it has not been sufficiently evaluated. In this study, we attempt GPU-oriented code generation using Claude Code—one such code-generation AI—for GeoFEM, a Fortran-based finite element application parallelized with MPI+OpenMP. We systematically investigate methods to achieve both the generation of high-performance GPU code and the streamlining of the code-generation process itself, including automated test execution.
17:10-17:30	Invited Talk The Project for Advancement of Software Usability in Materials Science (PASUMS) in the Era of Generative AI: Toward AI-Assisted Workflows and New Approaches to Human Resource Development Dr. Kazuyoshi Yoshimi (The Institute for Solid State Physics, University of Tokyo, JAPAN) slides In computational materials science, numerous research software packages are developed and maintained, requiring significant effort for continuous improvement and widespread adoption. At the Institute for Solid State Physics (ISSP), The University of Tokyo, the Project for Advancing Software Usability in Materials Science (PASUMS) promotes software development and enhancement, fostering usability, interoperability, and training for the community. PASUMS leverages generative AI technologies to enhance coding workflows, documentation, and user support. This presentation will introduce recent initiatives in AI-driven software development and capacity building, and discuss future prospects for the sustainability of the research software ecosystem through AI-assisted workflows and the cultivation of next-generation developers.
17:30-17:35	Closing Prof. Takahiro Katagiri (Nagoya University)
18:30-	Reception

Program (December 6th) - JHPCN Field Workshop Private Meeting (3)

Attendees: Prof. Takahiro Katagiri (Nagoya University, JAPAN), Dr. Osni Marques (Lawrence Berkeley National Laboratory, USA), Prof. Franz Franchetti (Carnegie Mellon University, USA)
Venue: Room 516 (5F), Information Technology Center, Nagoya University
Topic: Advanced Topics of Numerical Libraries and Auto-tuning Technology

10:30-12:30	Discussion: Current Status of Numerical Libraries and Auto-tuning Technology