Workshop on LLM-Driven Code Generation and Automation for HPC and Scientific Computing

Registration

Attendance is free of charge, but registration is required. Registration deadline is June 12. However, please contact us if you would like to participate after this point.

Registration

Access

LIP6 Laboratory, room 25-26 105, 1st floor
Sorbonne University, Campus Pierre et Marie Curie
4 place Jussieu, Paris, France
https://www.lip6.fr/informations/comment.php?LANG=en

Generative AI based on large language models (LLMs) is rapidly transforming high-performance computing (HPC) and scientific research workflows. This workshop brings together researchers from France and Japan to explore three interconnected frontiers: LLM-driven HPC code generation and autonomous research systems that operate across compiled-language and job-scheduler environments; hardware design automation spanning semiconductor development, HDL generation, and programming for emerging AI processors; and LLM-assisted numerical computation including time-series analysis, floating-point reliability, and linear solvers. Through research presentations and open discussion, participants aim to identify new directions and build opportunities for future international collaborative research.

13:30
13:35

Opening

13:35
14:50

Session 1: Code Generation and Research Automation Using LLM

13:35
14:00

HPC-GENIE project - Generative AI for HPC code development

Daichi Mukunoki — Information Technology Center, Nagoya University

Rapid advances in coding AI are revolutionizing software development. This is equally true in the HPC field, though it presents unique challenges distinct from general code development. For example, there are various considerations beyond functional correctness, including architecture-specific performance optimization, support for GPUs and Fortran, selection of appropriate algorithms tailored to the target environment, and control of numerical accuracy. At the Information Technology Center of Nagoya University, we are promoting the "HPC-GENIE" project, which focuses on applying generative AI to HPC code development. Our primary interests lie in the development of AI agents and technologies designed to operate all systems including LLMs within the user's local environment. In this talk, we will introduce our research cases to date and discuss our outlook for HPC code development in the era of generative AI.

This work was supported by JSPS KAKENHI JP25K24387, the Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN), and the JST Research and Development Program for Next-generation Edge AI Semiconductors (Grant Number JPMJES2511).

14:00
14:25

VibeCodeHPC: A Multi-Agent LLM Framework for Autonomous HPC Code Auto-Tuning — Toward an Agent-Driven Foundation for Semiconductor Design Workloads

Shun-ichiro Hayashi — Graduate School of Informatics, Nagoya University

VibeCodeHPC is a multi-agent LLM system in which multiple CLI-based agents coordinate to autonomously auto-tune workloads such as numerical kernels. Given a benchmark together with tuning requirements, the system explores optimization strategies, builds, executes, and iteratively improves performance. On representative HPC benchmarks, the multi-agent configuration consistently outperforms a single-agent baseline while exploring a wider variety of strategies. The framework is CLI-backend-agnostic and also supports local LLMs. The architecture itself is not kernel-specific: with appropriately authored requirement definitions (prompts), it can in principle drive a broader range of auto-optimization tasks — for example, semiconductor design workflows — positioning it as a foundation for agent-driven design and optimization workloads beyond HPC code generation.

This work was supported by the Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN) and the High-Performance Computing Infrastructure (HPCI) under Project ID jh250015. It was also partially supported by JSPS KAKENHI Grant Numbers JP23K11126 and JP24K02945. In addition, this work was supported by the JST Research and Development Program for Next-generation Edge AI Semiconductors (Grant Number JPMJES2511).

14:25
14:50

HPC-AutoResearch: Adapting Autonomous Research Systems for HPC through Split-Phase Execution

Takanori Kotama — Graduate School of Informatics, Nagoya University / RIKEN R-CCS

LLM-based autonomous research systems target only Python ML, leaving HPC—reliant on compiled languages and SLURM—untouched. We present HPC-AutoResearch, the first autonomous research system for HPC. It combines a Split-Phase Execution Model decomposing the pipeline into five phases (Planning, Setup, Coding, Compilation, Execution) inside Singularity containers with iterative repair, and a Compressed Inference Memory extending MemGPT with Core/Recall/Archival tiers. On a Himeno Benchmark task on an AMD EPYC 9554 SLURM cluster, HPC-AutoResearch alone scored 3/4 on all seven NeurIPS-style criteria (Accept, 7/10), outperforming Claude Code, Codex CLI, and Gemini CLI; ablating memory drops node success from 47.0% to 40.5%. Extensions to MPI and CFD/MD/first-principles workflows are discussed.

This study was supported by the JST Next-Generation Edge AI Semiconductor Research and Development Project JPMJES2511 and the Joint Usage/Research Center for Inter-disciplinary Large-scale Information Infrastructures (JHPCN) (#jh250015).

14:50
15:10

Coffee Break

15:10
16:25

Session 2: Hardware Design

15:10
15:35

From Vibe Coding to Silicon: Local LLM Agents for AI-Assisted Semiconductor Design

Takahiro Katagiri — Information Technology Center, Nagoya University

Semiconductor design is increasingly limited not only by device technology, but also by the productivity of design description, verification, and iterative refinement. This talk presents Nagoya University’s HPC-GENIE activities, focusing on VibeCodeHPC, a multi-LLM-agent framework originally developed for autonomous code generation, execution, verification, and tuning of HPC software. In VibeCodeHPC, role-specialized agents such as the Project Manager, System Engineer, Programmer, and Continuous Deliverer collaborate through shared prompts, inter-agent communication, monitoring, and dynamic deployment. This architecture enables natural-language-driven “vibe coding” while keeping agents focused on requirements, implementation, validation, and deliverable management. Building on this concept, the talk discusses its extension from HPC programs to semiconductor design codes, including SystemC transaction-level models, Verilog/VHDL RTL modules, HLS descriptions, EDA scripts, testbenches, assertions, and design-space exploration workflows. A key direction is to combine local LLMs with domain-specific design rules, coding guidelines, simulation feedback, lint/formal checks, and version-controlled refinement loops. The goal is not to replace hardware designers, but to create secure local AI design partners that accelerate specification-to-code translation, improve verification productivity, and make complex silicon design workflows more interactive, reproducible, and explainable.

This work was supported by the JST Research and Development Program for Next-generation Edge AI Semiconductors (Grant Number JPMJES2511).

15:35
16:00

Parametrized HDL Code Generation For Activation Functions

Aurélien Delmotte — Sorbonne University

Implementing Activation Functions on Hardware by hand is slow, error-prone and hardware-dependent, especially in the realm of more exotic number formats which are becoming more and more popular where a new implementation needs to be done for every new scenario. In this talk we will explore design challenges and solutions for the implementation of an Efficient Fixed-Point Softmax with Custom precision and automatic Pipelining.

16:00
16:25

AI Coding for Emerging AI Hardware

Daichi Mukunoki — Information Technology Center, Nagoya University

AI processors specifically designed for AI computing are emerging. Programming for such hardware is challenging due to the unique and specialized nature of the hardware and the dedicated programming language, resulting in high learning costs and difficulty. This talk will introduce an example of code development using coding agents such as Claude Code for Tensorrent hardware, AI processors based on RISC-V. This example focuses on code development for utilizing the AI processor in scientific computing workloads.

This work was supported by JSPS KAKENHI JP25K24387 and the JST Next-Generation Edge AI Semiconductor Research and Development Project JPMJES2511.

16:25
16:45

Coffee Break

16:45
18:00

Session 3: Numerical Computation

16:45
17:10

From Signals to Symbols: Enabling LLM-based Automation for Time Series Analysis

Xinye Chen — LIP6, Sorbonne University

Large language models have shown increasing promise in code generation and research automation, but their ability to assist with scientific workflows depends critically on whether domain data can be represented in a form they can understand. Time series data pose a particular challenge: they are continuous, noisy, high-dimensional, and not naturally aligned with the token-based interface of LLMs. In this talk, I will introduce LLM-ABBA, a time-series framework that turns time series into compact symbolic sequences. This approach helps improve later time-series tasks by making them easier to understand and work with. Though LLM-ABBA is not primarily built for code generation, its symbolic format can serve as an intermediate step for LLM-guided automation. For example, symbolic time series can help create code for feature extraction, motif discovery, anomaly detection, and other experimental tasks. I will explain how this focus on representation connects time series analysis with LLM-based research automation, and how it could help future systems both understand scientific signals and generate code to analyze them.

17:10
17:35

Benchmarking Large Language Models on Floating-Point Error Classification

Lisa Taldir — University of Perpignan

The study examines how Large Language Models (LLMs) can detect and classify floating-point arithmetic errors in software, which are subtle but potentially catastrophic. To evaluate this, the authors introduce InterFLOPBench, a benchmark comprising 90 C code instances and 1,130 test samples covering six error categories (cancellation, overflow, underflow, NaN, division by zero, and comparison errors), validated using FPChecker and Herbgrind. A dozen LLMs (including Gemini 2.5 Flash, GPT-4o, DeepSeek-R1, and Phi 4 reasoning) are tested. The evaluation framework treats floating-point error detection as a multi-label classification problem and employs the F1-score metric to measure performance. The results show that LLMs exhibit strong numerical reasoning, with the best models achieving micro F1 scores above 0.90. LLMs detect explicit errors such as comparison and division by zero well but struggle with subtler issues like underflow. LLMs are not yet meant to replace formal verification or dynamic analysis, but they are already effective as a semantic filter to locate suspicious code, as classification engines to categorize errors, and as explanation engines to detail why an operation is numerically unsafe. Combined with their ability to suggest more stable reformulations, they offer a promising complement to existing floating-point debugging workflows.

17:35
17:55

Can LLMs Help with Rounding Error Analysis for Matrix Computation Algorithms?

Takeshi Fukaya — Hokkaido University

Rounding error analysis plays an important role in numerical linear algebra, particularly matrix computation, because it provides a theoretical basis for assessing the reliability and validity of numerical algorithms. However, such analysis is often difficult, especially for researchers who are not familiar with numerical error analysis. It also requires a skill set that is substantially different from that needed for designing or implementing algorithms. In addition to fundamental knowledge of floating-point arithmetic and matrix analysis, it is necessary to combine known results appropriately and to bound error terms carefully, which are generally nontrivial tasks. In this talk, we report an initial exploration of whether and how LLMs can help with rounding error analysis. As a case study, we consider a tall-skinny QR factorization algorithm that we have recently developed and examine the process of carrying out rounding error analysis with support from a cloud-based LLM service. Rather than presenting a completed methodology or a definitive evaluation, this talk focuses on our observations from this preliminary attempt, including what kinds of assistance may be possible, where difficulties arise, and what precautions are needed when applying LLMs to theoretical analysis in scientific computing.

17:55
18:10

Uncertainty-aware Computation: From Error Propagation to Reliable Decision Support

Maria Lizeth Reyna Cruz – University of Texas at El Paso

This talk presents ongoing research on uncertainty-aware learning, an effort to develop machine learning models that explicitly represent, quantify, and propagate uncertainty throughout the decision-making process. An error analysis of deep neural networks is presented as a preliminary step. Medical decisions are routinely made using incomplete information, ambiguous findings, measurement variability, and competing diagnostic or therapeutic alternatives. Yet most learning systems ultimately reduce these realities to single-point predictions. Using three medical applications (developmental dysplasia of the hip, treatment decision support for relapsed non-Hodgkin lymphoma, and hospital triage) this talk illustrates how uncertainty aware models can provide bounded predictions, identify ambiguous cases, and support rather than replace clinical judgment.

18:00
18:30

Discussion

The Future of LLM-Driven HPC and Scientific Computing

18:30
18:35

Closing

Acknowledgements

PEQUAN (Performance and Quality of Numerical Algorithms) Team, LIP6, Sorbonne University — with special thanks to Prof. Stef Graillat and Prof. Fabienne Jezequel
JST (Japan Science and Technology Agency) Research and Development Program for Next-generation Edge AI Semiconductors (Grant Number JPMJES2511)
JHPCN (Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures) and HPCI (High Performance Computing Infrastructure) project jh260017
JSPS (Japan Society for the Promotion of Science) KAKENHI (Grants-in-Aid for Scientific Research) JP24K02945

Organizers

Daichi Mukunoki, Nagoya University, Japan
David Defour, Université de Perpignan, France
Stef Graillat, Fabienne Jézéquel, Sorbonne Université, France

Contact

Daichi Mukunoki, Nagoya University
mukunoki <at> cc.nagoya-u.ac.jp