The High Performance Big Data Research Team is investigating and developing software to facilitate extreme-scale big data processing, machine learning and deep learning for the K computer, Fugaku and beyond. The computational power in high performance computing (HPC) systems has been dramatically increasing, driven in particular by advanced multi/many-core architectures and new memory technologies such as high bandwidth memory and hybrid memory cubes. Although these HPC systems are keeping pace with required computational and memory performance for running scientific applications, they are inadequate with respect to I/O performance required by data-intensive applications.
To resolve this I/O problem in extreme-scale supercomputers, our research team is developing system software that facilitates a variety of big data processing by taking advantage of next-generation memory and storage architectures. We especially focus on several areas: fast and scalable parallel I/O for big data processing; scalable algorithms for machine learning and deep learning on hieratical memory and storage architectures; scalable checkpoints/restarts for fault tolerance; fast data transfer techniques for multi-petabytes of big data on high-speed networks; integration of software stacks of big data for HPC; and virtualization and container technologies.
We will proactively collaborate with domestic and international researchers from private companies, academia and national laboratories. With the momentum gained through these collaborations, we will strengthen our international presence in extreme-scale big data processing.
Hierarchical, user-level and on-demand filesystem
We are studying how to efficiently process big data in HPC systems by taking advantage of next-generation hardware. Recent HPC systems have adopted burst buffer systems. Burst buffers are an additional storage tier residing on top of parallel file systems (PFSs) in their storage hierarchy. Although burst buffers provide higher bandwidth and lower latency than PFSs, this additional storage tier creates more complexity for application developers. Therefore, a new filesystem that can aid application developers to efficiently utilize these advanced storage technologies is critical.
To facilitate use of this new storage architecture, we developed HuronFS, a hierarchical, user-level and on-demand filesystem for exploiting burst buffers. Unlike conventional filesystems, HuronFS creates an on-demand two-level hierarchical storage system and caches frequently used files to accelerate I/O performance. HuronFS has multiple metadata servers for scalable parallel I/O. In addition, by using file replication, failure detection and recovery techniques, HuronFS remains resilient in the event of system failures. HuronFS also exploits low-latency and high-bandwidth interconnects (InfiniBand) in supercomputers.
As big data processing assumes more importance occasioned by machine learning, deep learning and artificial intelligence, I/O performance is becoming more critical than ever in supercomputers. But our new filesystem will facilitate big data processing in extreme-scale supercomputers and beyond
Overview of HuronFS