Enthusiastic software developer in HPC since 2010, I like facing new challenges using cutting-edge technologies. With a strong research background, I hold a Ph.D. that focuses on optimizing network data-movements and the scalability of HPC applications for very large systems. After an experience in compute-intensive applications, I am now working on accelerating I/O bound applications.
My area of interests includes:
Context: RED is a high-performance software defined object-store, which internally supports a dual I/O engine which selects an optimized IO path depending on the nature of the IO: latency or bandwidth (bulk data) driven.
Role: Runtime and network engineer.
OFI Libfabric (202+ commits), low-level network library that abstracts diverse networking technologies under a common API. Selected contributions:
MPICH, IOR, OpenMPI, FIO: Added support for IME native.
Context: Infinite Memory Engine (IME) is a scale-out, flash-native, software-defined, storage cache that streamlines the data path for application IO.
Role: Lead developer of the network communication layer in IME, which supports Infiniband (EDR/FDR), Intel Omni-Path and Ethernet networks. The network code has proved to scale up to 2048 compute nodes and delivers I/O performance exceeding 1 TB/s on the Oakforest-PACS system (JCAHPC). Data locality optimizations and performance analysis in highly distributed environments.
Role: Study of an HPC solution in the context of Cloud Computing. A proof-of-concept was designed based on the Linux Containers (LXC) and the low-latency Cisco usNIC protocol over 10 Gigabit Ethernet links.
Title: Improving memory consumption and performance scalability of HPC applications with multi-threaded network communications
Summary
I developed in C a multi-threaded communication layer over Infiniband. I focused scientific applications parallelized using the Message Passing Interface (MPI) standard and the low-level OFA verbs communication API.
Achievements
Teaching assistant: 108 hours
Design and development of a multi-threaded communication layer in shared-memory context for the Message Passing Interface (MPI) standard. Communication layer implemented inside MPC, a state-of-the-art runtime fully supporting MPI 1.3 (http://mpc.sourceforge.net/)
Maintenance and evolutions of a Web application using the IBM Lotus Domino software.
Development of a solution to easily deploy and manage clusters of virtual machines using Xen hypervisor. Integration to the University’s network.
Doctor of Philosophy - PhD, Computer Science - High Performance Computing
Master’s degree in Computer Engineering, Network communications & Embedded systems
With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression. We designed our approch for Infiniband into a thread-based MPI runtime called MPC. We evaluate the gain from Collaborative Polling on the NAS Parallel Benchmarks and three scientific applications, where we show significant improvements in communication times up to a factor of 2.
With the rise of parallel applications complexity, the needs in term of computational power are continually growing. Recent trends in High-Performance Computing (HPC) have shown that improvements in single-core performance will not be sufficient to face the challenges of an Exascale machine: we expect an enormous growth of the number of cores as well as a multiplication of the data volume exchanged across compute nodes. To scale applications up to Exascale, the communication layer has to minimize the time while waiting for network messages. This paper presents a message progression based on Collaborative Polling which allows an efficient auto-adaptive overlapping of communication phases by performing computing. This approach is new as it increases the application overlap potential without introducing overheads of a threaded message progression.
Credential Identifier Credential ID WVNDUB9QW52N
Credential Identifier Credential ID 9DW8RMJHN2UL
Credential Identifier Credential ID S6K5VPS42EN4
Credential Identifier Credential ID 7YL9VKSW78K2
Credential Identifier Credential ID M43A6MFFQ8QS
Credential Identifier Credential ID 5A3LCGX42GX9