R-CCS Cafe 特別版（2025年5月9日） | 理化学研究所計算科学研究センター(R-CCS)

トップページイベント・広報 R-CCS Cafe R-CCS Cafe 特別版（2025年5月9日）

詳細
開催日	2025年5月9日(金)
開催時間	16:00-17:00
開催都市	オンライン
場所	Zoomによる遠隔セミナー R-CCS外部の方で参加希望の場合は r-ccs-cafe[at]ml.riken.jp までご連絡ください。
主催	理化学研究所計算科学研究センター（R-CCS）
使用言語	発表・スライド共に英語
登壇者	Dr. Andreas Knüpfer（16:00-16:30） The head for the Scientific Computing Core (SCC) department at the Center for Advanced Systems Understanding (CASUS) in Görlitz/Saxony/Germany which is part of the Helmholtz-Zentrum Dresden-Rossendorf (HZDR) Ivan R. Ivanov（16:30-17:00）理化学研究所計算科学研究センター（R-CCS）高性能計算モデリング研究チーム大学院生リサーチ・アソシエイト

講演題目・要旨

1st Speaker: Dr. Andreas Knüpfer

Title:
Data Management and Machine-Actionable Reproducibility for HPC with git and Datalad
Abstract:
The talk will present the two connected topics of Research Data Management (RDM) in HPC environments and of reproducibility of computational results and how it can be adopted for HPC. Research Data Management according to the F.A.I.R. principles is an established concept and required for most/all Computational Science projects out of formal reasons (because institutions and funding agencies demand it) and out of practical reasons (because it benefits researchers). Versioning of data collections is for some reasons not part of the F.A.I.R. principles even though it is most useful for software development and many types of text-based data collections. Git is the de-facto standard for distributed version control there. The talk present how git can be applied for general data including large, binary files. Furthermore, it presents extra benefits of such research data repositories for typical HPC projects. The second part of the talk introduces Datalad, a tool on top of git repositories, which offers machine-actionable reproducibility for computational results integrated in the git log of a data repository. This is incompatible with HPC processing unfortunately, because of batch processing and because git repositories don’t allow concurrent accesses. We developed a solution for this conflict and can now offer machine-actionable reproducibility for HPC where many Slurm jobs contribute to large data repositories.
Speaker Info:
Andreas Knüpfer is the head for the Scientific Computing Core (SCC) department at the Center for Advanced Systems Understanding (CASUS) in Görlitz/Saxony/Germany which is part of the Helmholtz-Zentrum Dresden-Rossendorf (HZDR). His background is Applied Mathematics from his first degree and High Performance Computing (HPC) / Computer Science from his PhD. The mission of the SCC department is research on Computational Science methods from classical HPC simulations and parallel programming to ML/DNN models, surrogate models as well as Research Data Management (RDM), collaborative software development methodologies and tools. Furthermore, it collaborates with and supports all other groups at CASUS with their Computational Science challenges to allow better science from the combination of application-area expertise with the best computational approaches.

2nd Speaker: Ivan R. Ivanov

Title:
Input-Gen: Guided Generation of Stateful Inputs for Testing, Tuning, and Training
Abstract:
The size and complexity of software applications is increasing at an accelerating pace. Source code repositories (along with their dependencies) require vast amounts of labor to keep them tuned, tested, maintained, and up to date. As the discipline now begins to also incorporate automatically generated programs, automation in testing and tuning is required to keep up with the pace - let alone reduce the present level of complexity. While machine learning has been used to understand and generate code in various contexts, machine learning models themselves are trained almost exclusively on static code without inputs, traces, or other execution time information. This lack of training data limits the ability of these models to understand real-world problems in software. In this work we show that inputs, like code, can be generated automatically at scale. Our generated inputs are stateful, and appear to faithfully reproduce the arbitrary data structures and system calls required to rerun a program function. By building our tool within the compiler, it both can be applied to arbitrary programming languages and architectures and can leverage static analysis and transformations for improved performance. Our approach is able to produce valid inputs, including initial memory states, for 90% of the ComPile dataset modules we explored, for a total of 21.4 million executable functions.

注意事項

参加の際はPCマイクの音声・ビデオをオフにされるようお願いいたします。
当日の会場環境や通信状態により、やむなく配信を中止・中断する場合がございます。
プログラムの内容、時間は予告なく変更される場合があります。
ご使用の機器やネットワークの環境によっては、ご視聴いただけない場合がございます。
インターネット中継に関する著作権は、主催者及び発表者に帰属します。なお、配信された映像及び音声、若しくはその内容を、理化学研究所の許可無くほかのウェブサイトや著作物等への転載、複製、改変等を行うことを禁じます。

（2025年5月8日）