Open Theses

Important remark on this page

The following list is by no means exhaustive or complete. There is always some student work to be done in various research projects, and many of these projects are not listed here. Don't hesitate to drop an email to any member of the chair asking for currently available topics in their field of research.

Abbreviations:

  • PhD = PhD Dissertation
  • BA = Bachelorarbeit, Bachelor's Thesis
  • MA = Masterarbeit, Master's Thesis
  • GR = Guided Research
  • CSE = Computational Science and Engineering

Cloud Computing / Edge Computing / IoT / Distributed Systems

MA/GR: A Framework for Federated Learning using Serverless Computing

Background

Federated learning (FL) enables resource-constrained edge devices to learn a shared Machine Learning (ML) or Deep Neural Network (DNN) model, while keeping the training data local and providing privacy, security, and economic benefits. However, building a shared model for heterogeneous devices such as resource-constrained edge and cloud makes the efficient management of FL-clients challenging. Furthermore, with the rapid growth of FL-clients, the scaling of FL training process is also difficult.

Goals

  1. The aim of this work is to develop and implement a framework for federated learning on heterogeneous devices using FaaS based functions. We already have a baseline implementation which needs to be extended.
  2. Ensuring fault tolerance and scalability of different components in the FL system.
  3. Incorporating privacy and security in the framework.

Requirements

  • Good knowledge of ML and Deep learning.
  • Good Knowledge of Python.
  • English communication skills.
  • Knowledge aof FaaS platforms.
  • Minimum of 80 ECTS completed.

We offer:

  • Thesis in the area that is highly demanded by the industry
  • Our expertise in data science and systems areas
  • Supervision and support during the thesis
  • Access to the LRZ cloud
  • Opportunity to publish a research paper with your name on it

What we expect from you:

  • Devotion and persistence (= full-time thesis)
  • Critical thinking and initiativeness
  • Attendance of feedback discussions on the progress of your thesis

Apply now by submitting your CV and grade report to Mohak Chadha (mohak.chadha@tum.de) or Anshul Jindal  (anshul.jindal@tum.de) .

MA/GR: Evaluating Open Source Serverless Frameworks for Heterogeneous Edge Devices

Background

Serverless computing, with function-as-a-service (FaaS) is an attractive cloud model in which the user is not responsible for server deployment and infrastructure management, but only for writing the code and packaging it. In FaaS, an application is decomposed into simple, standalone functions that are deployed to a serverless platform for execution. Although originally designed for cloud enviornments serverless computing is gaining traction in Edge Computing. To avoid dependency with a specific vendor severeral open-source serverless frameworks have been proposed. However, their usage and viability on heterogeneous edge devices is still unclear.

Goals:

  1. The aim of this work is to evaluate and analyze the performance of different open source serverless frameworks on different heterogeneous edge devices such as FPGAs/Rasberry Pis etc. for particular usecases eg. Edge AI.

Requirements

  • Good Knowledge of Python.
  • Basic knowledge of FaaS platforms.
  • Knowledge of Docker, K8s, monitoring stack such as Prometheus

We offer:

  • Thesis in the area that is highly demanded by the industry
  • Our expertise in data science and systems areas
  • Supervision and support during the thesis
  • Access to different systems required for the work
  • Opportunity to publish a research paper with your name on it

What we expect from you:

  • Devotion and persistence (= full-time thesis)
  • Critical thinking and initiativeness
  • Attendance of feedback discussions on the progress of your thesis

Apply now by submitting your CV and grade report to Mohak Chadha (mohak.chadha@tum.de) or Anshul Jindal  (anshul.jindal@tum.de) .

 

BA/GR: Comparing Microservices and Serverless based Architectures for IoT Applications

Background

With the rise of the adoption of microservice architecture due to its agility, scalability, and resiliency for building the cloud-based applications and their deployment using containerization, DevOps were in demand for handling the development and operations together. However, nowadays serverless computing offers a new way of developing and deploying cloud-native applications. Serverless computing also called NoOps, offloads management and server configuration (operations work) from the user to the cloud provider and lets the user focus only on the product developments. Hence, there are debates regarding which deployment strategy to use.

Goals:

  1. The aim of this work is to evaluate and analyze the performance of a scalable application which receives and stores IoT sensor data based on microservice and FaaS based architectures.

Requirements

  • Good Knowledge of Javascript.
  • Basic knowledge of FaaS platforms. Knowledge of OpenWhisk is beneficial.
  • Knowledge of docker, K8s.
  • Load Testing tool such as k6.
  • Knowledge of GKE or AKS.
  • Basic knowledge of Kafka, MQTT, and Elastic Search.

We offer:

  • Thesis in the area that is highly demanded by the industry
  • Our expertise in data science and systems areas
  • Supervision and support during the thesis
  • Access to different systems required for the work
  • Opportunity to publish a research paper with your name on it

What we expect from you:

  • Devotion and persistence (= full-time thesis)
  • Critical thinking and initiativeness
  • Attendance of feedback discussions on the progress of your thesis

Apply now by submitting your CV and grade report to Mohak Chadha (mohak.chadha@tum.de) or Anshul Jindal  (anshul.jindal@tum.de) .

 

BA: Development of an Arduino-based Power Collection System

Background: 

As part of the SensE project (see https://sense.caps.in.tum.de ), we are exploring various architectures for the development of a sensor data processing system based on a collaborative Edge-Cloud paradigm. In this context, various model partitioning and load distribution schemes are evaluated. We want to develop a performance data collection system to study the tradeoffs for different schemes. 

Description: 

We intend to develop an Arduino-based power collection system. This system intends to collect and store power consumption data of various edge systems, e.g., Jetson Nano, NUCs and so on. The power data is collected using power meters, and the data is transferred to an Arduino board. The Arduino stores and sends the data to a database (DCDB: https://gitlab.lrz.de/dcdb/dcdb) for central storage and further analysis.

Tasks:

  1. HW and system setup
  2. Development of power collection in Arduino
  3. Integration of the data collection to DCDB through SysFS plugins

Contact: (amir.raoofy@tum.de)

 

BA: Performance Evaluation of Storage Backends for Sensor Data Streams

Background: 

Sensor data streams generate data at very high rates and large parts of this data is used for training machine learning models. Thus, it's important that data can be accessed quickly and is stored efficiently. As part of the SensE project (see https://sense.caps.in.tum.de ), we are evaluating different storage backends for sensor stream storage.

Description: 

Sensor data storage systems are built with different key requirements, leading to different performance characteristics and hardware ressource requirements.

In this thesis, you will evaluate different existing storage alternatives in order to characterise their performance characteristics. You are going to benchmark different stream storage systems regarding throughput and other performance metrics.

Your Tasks:

  1. Collect information about which systems should be evaluated and understand their main design goals. Alternatives include (but are not limited to)
    • InfluxDB
    • TimeScaleDB
    • ...
  2. Create a benchmark framework to test a selected subset of these systems.
  3. Perform benchmarks on one or multiple low-power hardware platforms.

Contact: (roman.karlstetter@tum.de)

 

Supercomputing and Intra-/Inter-Node Resource Management

BA/MA/GR: Modeling and Algorithm for Efficient Resource/Power Management on Modern HPC Nodes

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improves the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Research Summary:

In this work, we will focus on co-scheduling and power management on HPC systems, with a particular focus on heterogeneous computing nodes, consisting of multiple different processors (CPU, GPU, etc.) or memory technologies (DRAM, NVRAM, etc.). Recent hardware components generally support a variety of resource partitioning and power control features, such as bandwidth partitioning, compute resource partitioning, clock scaling, power capping, and others. Our goal in this study is to provide a sophisticated mechanism to comprehensively optimize these various hardware setups, as well as the selection of co-locating jobs from a given job set, so that a given objective function (e.g., total throughput) is maximized. For this, we will develop the followings: (1) several models (could be based on machine learning) to predict power, performance, interference, etc., as functions of hardware setups and a set of co-located jobs; (2) algorithms to optimize the hardware setups and the job selections from a job queue based on the developed models.

Notes:

  • Due to the time limitation, you may tackle a subproblem, such as optimizing resource partitioning on GPUs (e.g., A100), power budgeting across different components, or developing a hardware agnostic power/performance modeling, however all of which would be ultimately a great contribution to the project.
  • There is no requirement for this topic, but parallel programming and GPU experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English. 

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

BA/MA/GR: Exploring Co-Scheduling and Power Management on HPC Clusters

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improve the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Thesis Summary:

In this thesis, we will focus on co-scheduling and power management on HPC clusters, mainly from the job scheduler side (i.e., Slurm, https://slurm.schedmd.com) and will firstly examine a variety of features supported by the current production-level software stack (i.e., Slurm plus several extensions) on a real hardware. Then, the next step will be one or more of the followings depending on your preferences: (1) list all the missing pieces in the software stack to realize sophisticated co-scheduling and power management features, and then provide architecture-level solutions to realize them; (2) pick up one (or more) of the missing features, and extend the existing software stack to support it; or (3) propose a job scheduling algorithm to fully exploit the currently supported co-scheduling or power management features (or your newly implemented ones). If necessary, we will use also job scheduling simulators to test our ideas.

Notes:

  • The research outcome obtained here will be a nice feedback to the Regale project for the entire software integration and architecture design. Thus definitely, your work will be a significant contribution to the project.
  • There is no requirement for this topic, but any parallel programming and HPC cluster management experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English.

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

BA/MA/GR :Coordinating Workflow Engine and Resource/Power Management Software Stack

Background:

As a part of Regale project (https://regale-project.eu/), we are working on holistic resource management mechanisms for supercomputers, from both scientific and engineering aspects. The major goal of the project is to provide a prototype software stack that significantly improves the total system throughput, energy efficiency, etc., via sophisticated resource management mechanisms including power and temperature controls, co-scheduling (co-locating multiple jobs in a node to maximize the resource utilizations), and elasticity support (flexibly controlling the job/resource scales).

Thesis Summary:

In this thesis, we will focus on some workflow engines (e.g., Melissa, https://gitlab.inria.fr/melissa/melissa) and our resource management software stack (incl. Slurm, https://slurm.schedmd.com), and explore the benefits of coordinating them to improve total system throughput, energy efficiency, or other aspects. These workflow engines are useful for running scientific simulations efficiently while changing inputs, conditions, parameters, etc., and Melissa in particular supports several advanced features such as fault tolerance, automatic concurrency handling, and online neural network training. Our goals in this study are: (1) optimizing job scheduling and power/resource management while being explicitly aware of the behavior and characteristics of such workflow-based jobs; and (2) interacting with the workflow engine accordingly and providing a right interface to them for this purpose.

Notes:

  • The research outcome obtained here will be a nice feedback to the Regale project for the entire software integration and architecture design. Thus definitely, your work will be a significant contribution to the project.
  • There is no requirement for this topic, but any parallel programming and HPC cluster management experiences/skills will help.
  • You will work together with all the members of the Regale project in this chair, and the discussions will be in English.

Contact:

In case of interest, please contact Eishi Arima (eishi.arima@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz)

Memory Management and Optimizations on Heterogeneous HPC Architectures

[RESERVED] MA/BA/GR: Measuring and analyzing data movements across components of heterogeneous nodes

Background:

The DEEP-SEA project (https://www.deep-projects.eu) is a European effort on developing software for coming exascale supercomputing architectures. As a member of the project, CAPS TUM works in several areas. One of those areas is development of a tool for analyzing application performance by identifying suboptimal memory behaviour of an application. Memory operations are very costly, therefore unoptimized memory access patterns in an application can have a huge negative impact on the overall performance. For this reason, analyzing and optimizing data movements on a single node can play a very important role in increasing performance of parallel applications. We took over MemAxes tool (https://github.com/LLNL/MemAxes), originally developed at LLNL, as a base for our for memory access visualisation tool, and plan on extending and improving it massively to fit the needs of modern heterogeneous architectures, which are expected to be the core of upcoming exascale supercomputers. Along with MemAxes visualisation tool, we develop Mitos tool (https://github.com/LLNL/Mitos) that collects and provides the data for visualisation.

Context:

Apart from measuring the data movements within the particular chips (CPUs, GPUs,..), it is also necessary to understand the data movements on a node, i.e. the data transfers and accesses between different components of a node.

Tasks/Goals:

  • Identify what information regarding data movement between different components on one node can be collected. (Measuring traffic on a system bus (PCIe, NVlink,..), where PCIe is probably the most important one)
  • Propose, design, and implement a solution for collecting the data.
  • (MT only) For this, the type of connection and some properties (e.g. max. bandwidth) could be collected as well to help assess the actual usage.
  • (MT only) Finally, the measured data can be presented in MemAxes so that potential bottlenecks can be identified by the user.

Note:
Especially for BA students, the learning curve may be rather flat, therefore some prior knowledge of low-level programming and/or bus systems would be beneficial (but not necessary). For GR, this topic may be adjusted to more of a scientific survey and experimenting to discover what is and is not possible.

Contact:
In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz) and attach your CV & transcript of records.

Published on 08.10.2021 (3)

MA(Games Engineering)/BA: Redesigning MemAxes data visualisations to manage more complex heterogeneous architectures

Background:

The DEEP-SEA project (https://www.deep-projects.eu) is a European effort on developing software for coming exascale supercomputing architectures. As a member of the project, CAPS TUM works in several areas. One of those areas is development of a tool for analyzing application performance by identifying suboptimal memory behaviour of an application. Memory operations are very costly, therefore unoptimized memory access patterns in an application can have a huge negative impact on the overall performance. For this reason, analyzing and optimizing data movements on a single node can play a very important role in increasing performance of parallel applications. We took over MemAxes tool (https://github.com/LLNL/MemAxes), originally developed at LLNL, as a base for our for memory access visualisation tool, and plan on extending and improving it massively to fit the needs of modern heterogeneous architectures, which are expected to be the core of upcoming exascale supercomputers. Along with MemAxes visualisation tool, we develop Mitos tool (https://github.com/LLNL/Mitos) that collects and provides the data for visualisation.

Context:

  • Current state is that MemAxes was built to visualise data gathered on a single (multi-core) CPU.
  • We want to make our analysis with MemAxes more comprehensive, and therefore want to collect information about multiple chips on the node. On a modern heterogeneous node, there can be multiple CPUs, GPUs, or possibly FPGAs; in the future, also network cards may become of our interest.

Tasks/Goals: 

  • Adapt MemAxes to support heterogeneous nodes (Nodes containing multiple CPUs and GPUs. A universal solution that would fit other types of chips would be nice, but is not a must.)
  • As there will be more information than what is possible to fit on one screen, you should come up with a solution that offers both an overview and enough detail (e.g. zooming in/out to different chips on the node) -> adapt the visualisation, user interface and data/aggregates being presented based what parts of system are currently being presented. (Or provide an alternative solution that is intuitive and clear.)

Contact:

In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz) and attach your CV & transcript of records.

Published on 08.10.2021 (6)

MA: Integrating aggregate data measurements to Mitos and their evaluation in MemAxes

Background:
The DEEP-SEA project (https://www.deep-projects.eu) is a European effort on developing software for coming exascale supercomputing architectures. As a member of the project, CAPS TUM works in several areas. One of those areas is development of a tool for analyzing application performance by identifying suboptimal memory behaviour of an application. Memory operations are very costly, therefore unoptimized memory access patterns in an application can have a huge negative impact on the overall performance. For this reason, analyzing and optimizing data movements on a single node can play a very important role in increasing performance of parallel applications. We took over MemAxes tool (https://github.com/LLNL/MemAxes), originally developed at LLNL, as a base for our for memory access visualisation tool, and plan on extending and improving it massively to fit the needs of modern heterogeneous architectures, which are expected to be the core of upcoming exascale supercomputers. Along with MemAxes visualisation tool, we develop Mitos tool (https://github.com/LLNL/Mitos) that collects and provides the data for visualisation.

Context:

  • The current implementation of Mitos/MemAxes collects PEBS samples of memory accesses (via perf), i.e. every n-th memory operation is measured and stored.
  • Collecting aggregate data alongside with PEBS samples could help increase the overall understanding of the system and application behaviour.

Tasks/Goals: 

  • Analyse what aggregate data are meaningful and possible to collect (total traffic, BW utilization, num LD/ST, ...?) and how to collect them (papi? likwid? perf?)
  • Ensure that these measurements don't interfere with the existing collection of PEBS samples.
  • Design and implement a low-overehad solution.
  • Find a way to visualise/present the data in MemAxes tool (or different visualisation tool if MemAxes is not suitable.
  • Finally, present how the newly collected data help the users to understand the system or hint the user if/how to do optimizations.

Contact:

In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz).
 

Published on 21.09.2021 (1)

MA: Memory access data sample collection on AMD CPUs

Background:
The DEEP-SEA project (https://www.deep-projects.eu) is a European effort on developing software for coming exascale supercomputing architectures. As a member of the project, CAPS TUM works in several areas. One of those areas is development of a tool for analyzing application performance by identifying suboptimal memory behaviour of an application. Memory operations are very costly, therefore unoptimized memory access patterns in an application can have a huge negative impact on the overall performance. For this reason, analyzing and optimizing data movements on a single node can play a very important role in increasing performance of parallel applications. We took over MemAxes tool (https://github.com/LLNL/MemAxes), originally developed at LLNL, as a base for our for memory access visualisation tool, and plan on extending and improving it massively to fit the needs of modern heterogeneous architectures, which are expected to be the core of upcoming exascale supercomputers. Along with MemAxes visualisation tool, we develop Mitos tool (https://github.com/LLNL/Mitos) that collects and provides the data for visualisation.

Context:

  • Intel PEBS (Precise event based sampling) enables memory access data collection on modern Intel CPUs, which is used by Mitos(https://github.com/LLNL/Mitos) project to collect data access samples for MemAxes.
  • AMD's IBS should offer similar functionality, however it is not supported by Mitos at the moment.
  • In terms of increasing the versatility of Mitos/MemAxes projects, its functionality should not be limited only to Intel CPUs.

Tasks/Goals: 

  • Investigate on the functionality of IBS and find out whether it can provide the same data as PEBS for AMD CPUs . If not, research in possible alternatives for supplementing the missing functionality. If IBS provides additional relevant information, it can be proposed how the data could be used by MemAxes tool.
  • Implement memory access data collection on AMD CPUs and include it in Mitos project.
  • Design logic in Mitos that automatically switches between AMD and Intel processors so that the same code can be compiled and run on both platforms.
  • Present the data collected by an example application in MemAxes.

Contact:
In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz).

Published on 21.09.2021 (4)

[RESERVED] BA: Collection of GPU memory hierarchy information

Background:
The DEEP-SEA project (https://www.deep-projects.eu) is a European effort on developing software for coming exascale supercomputing architectures. As a member of the project, CAPS TUM works in several areas. One of those areas is development of a tool for analyzing application performance by identifying suboptimal memory behaviour of an application. Memory operations are very costly, therefore unoptimized memory access patterns in an application can have a huge negative impact on the overall performance. For this reason, analyzing and optimizing data movements on a single node can play a very important role in increasing performance of parallel applications. We took over MemAxes tool (https://github.com/LLNL/MemAxes), originally developed at LLNL, as a base for our for memory access visualisation tool, and plan on extending and improving it massively to fit the needs of modern heterogeneous architectures, which are expected to be the core of upcoming exascale supercomputers. Along with MemAxes visualisation tool, we develop Mitos tool (https://github.com/LLNL/Mitos) that collects and provides the data for visualisation.

Context:

  • HWloc gives a good overview of CPU memory hierarchy. This information is uploaded to sys-topo library and represents the CPU there.
  • It only provides very little information about memory and compute unit hierarchy/grouping on GPUs. Therefore, we need to find a different way to gather these data.

Tasks/Goals: 

  • Find ways to obtain the information (focus on NVidia and/or AMD GPUs)
  • Parse the obtained information and upload it to sys-topo library in a similar way an HWloc output regarding CPUs is parsed and uploaded.
  • (Optional) Ensure MemAxes visualises the GPU hierarchy properly or propose/make updates for supporting GPUs.

Contact:
In case of interest, please contact Stepan Vanecek (stepan.vanecek@tum.de) at the Chair for Computer Architecture and Parallel Systems (Prof. Schulz).

Published on 21.09.2021 (7)

Various MPI-Related Topics

Please Note: MPI is a high performance programming model and communication library designed for HPC applications. It is designed and standardised by the members of the MPI-Forum, which includes various research, academic and industrial institutions. The current chair of the MPI-Forum is Prof. Dr. Martin Schulz.  The following topics are all available as Master's Thesis and Guided Research. They will be advised and supervised by Prof. Dr. Martin Schulz himself, with help of researches from the chair. If you are very familiar with MPI and parallel programming, please don't hesitate to drop a mail to Prof. Dr. Martin Schulz.  These topics are mostly related to current research and active discussions in the MPI-Forum, which are subject of standardisation in the next years. Your contribution achieved in these topics may make you become contributor to the MPI-Standard, and your implementation may become a part of the code base of OpenMPI. Many of these topics require a collaboration with other MPI-Research bodies, such as the Lawrence Livermore National Laboratories and Innovative Computing Laboratory. Some of these topics may require you to attend MPI-Forum Meetings which is at late afternoon (due to time synchronisation worldwide). Generally, these advanced topics may require more effort to understand and may be more time consuming - but they are more prestigious, too. 

MA/GR: Porting LAIK to Elastic MPI & ULFM

LAIK is a new programming abstraction developed at LRR-TUM

  • Decouple data decompositionand computation, while hiding communication
  • Applications work on index spaces
  • Mapping of index spaces to nodes can be adaptive at runtime
  • Goal: dynamic process management and fault tolerance
  • Current status: works on standard MPI, but no dynamic support

Task 1: Port LAIK to Elastic MPI

  • New model developed locally that allows process additions and removal
  • Should be very straightforward

Task 2: Port LAIK to ULFM

  • Proposed MPI FT Standard for “shrinking” recovery, prototype available
  • Requires refactoring of code and evaluation of ULFM

Task 3: Compare performance with direct implementations of same models on MLEM

  • Medical image reconstruction code
  • Requires porting MLEM to both Elastic MPI and ULFM

Task 4: Comprehensive Evaluation

MA/GR: Lazy Non-Collective Shrinking in ULFM

ULFM (User-Level Fault Mitigation) is the current proposal for MPI Fault Tolerance

  • Failures make communicators unusable
  • Once detected, communicators an be “shrunk”
  • Detection is active and synchronous by capturing error codes
  • Shrinking is collective, typically after a global agreement
  • Problem: can lead to deadlocks

Alternative idea

  • Make shrinking lazy and with that non-collective
  • New, smaller communicators are created on the fly

Tasks:

  • Formalize non-collective shrinking idea
  • Propose API modifications to ULFM
  • Implement prototype in Open MPI
  • Evaluate performance
  • Create proposal that can be discussed in the MPI forum

MA/GR: A New FT Model with “Hole-Y” Shrinking

ULFM works on the classic MPI assumptions

  • Complete communicator must be working
  • No holes in the rank space are allowed
  • Collectives always work on all processes

Alternative: break these assumptions

  • A failure creates communicator with a hole
  • Point to point operations work as usual
  • Collectives work (after acknowledgement) on reduced process set

Tasks:

  • Formalize“hole-y” shrinking
  • Proposenew API
  • Implement prototype in Open MPI
  • Evaluate performance
  • Create proposal that can be discussed in the MPI Forum

MA/GR: Prototype for MPI_T_Events

With MPI 3.1, MPI added a second tools interface: MPI_T

  • Access to internal variables 
  • Query, read, write
  • Performance and configuration information
  • Missing: event information using callbacks
  • New proposal in the MPI Forum (driven by RWTH Aachen)
  • Add event support to MPI_T
  • Proposal is rather complete

Tasks:

  • Implement prototype in either Open MPI or MVAPICH
  • Identify a series of events that are of interest
  • Message queuing, memory allocation, transient faults, …
  • Implement events for these through MPI_T
  • Develop tool using MPI_T to write events into a common trace format
  • Performance evaluation

Possible collaboration with RWTH Aachen

 

MA/GR: Evaluation of PMIx on MPICH and SLURM

PMIxis a proposed resource management layer for runtimes (for Exascale)

  • Enables MPI runtime to communicate with resource managers
  • Come out of previous PMI efforts as well as the Open MPI community
  • Under active development / prototype available on Open MPI

Tasks: 

  • Implement PMIx on top of MPICH or MVAPICH
  • Integrate PMIx into SLURM
  • Evaluate implementation and compare to Open MPI implementation
  • Assess and possible extend interfaces for tools 
  • Query process sets

MA/GR: Active Messaging for Charm++ or Legion

MPI was originally intended as runtime support not as end user API

  • Several other programming models use it that way
  • However, often not first choice due to performance reasons
  • Especially task/actor based models require more asynchrony

Question: can more asynchronmodels be added to MPI

  • Example: active messages

Tasks:

  • Understand communication modes in an asynchronmodel
  • Charm++: actor based (UIUC)•Legion: task based (Stanford, LANL)
  • Propose extensions to MPI that capture this model better
  • Implement prototype in Open MPI or MVAPICH
  • Evaluation and Documentation

Possible collaboration with LLNL and/or BSC

MA/GR: Crazy Idea: Multi-MPI Support

MPI can and should be used for more than Compute

  • Could be runtime system for any communication
  • Example: traffic to visualization / desktops

Problem:

  • Different network requirements and layers
  • May require different MPI implementations
  • Common protocol is unlikely to be accepted

Idea: can we use a bridge node with two MPIs linked to it

  • User should see only two communicators, but same API

Tasks:

  • Implement this concept coupling two MPIs
  • Open MPI on compute cluster and TCP MPICH to desktop
  • Demonstrate using on-line visualization streaming to front-end
  • Document and provide evaluation
  • Warning: likely requires good understanding of linkers and loaders

Field-Programmable Gate Arrays

Field Programmable Gate Arrays (FPGAs) are considered to be the next generation of accelerators. Their advantages reach from improved energy efficiency for machine learning to faster routing decisions in network controllers. If you are interested in one of it, please send your CV and transcript record to the specified Email address.

Our chair offers various topics available in this area:

  • Direct network operations: Here, FPGAs are wired closer to the networking hardware itself, hence allows to overcome the network stack which a regular CPU-style communication would be exposed to. Your task would be to investigate FPGAs which can interact with the network closer than CPU-based approaches. ( martin.schreiber@tum.de )
  • Linear algebra: Your task would be to explore strategies to accelerate existing linear algebra routines on FPGA systems by taking into account applications requirements. ( martin.schreiber@tum.de )
  • Varying accuracy of computations: The granularity of current floating point computations is 16, 32, or 64 bit. Your work would be on tailoring the accuracy of computations towards what's really required. ( martin.schreiber@tum.de )
  • ODE solver: You would work on an automatic toolchain for solving ODEs originating from computationa biology. ( martin.schreiber@tum.de )
  • Data Mining: You would explore deployment of a class of data mining algorithms such as matrix profile and k-nearest neighbors for mining similarities, irregularities, and anomalies in datasets from various fields. ( amir.raoofy@tum.de )
  • Machine Learning: We are developing a benchmark to enable the evaluation of ML inference for Earth Observation, e.g., ship detection in satellite imagery. You would evaluate various approaches (HLS4ML, FINN and Brevitas, and Vitis AI) for the deployment of ML models ( e.g., Convolutional Neural Networks ) on an FPGA-based accelerated system. ( amir.raoofy@tum.de )

 

Applied mathematics & high-performance computing

There are various topics available in the area bridging applied mathematics and high-performance computing. Please note that this will be supervised externally by Prof. Dr. Martin Schreiber (a former member of this chair, now at Université Grenoble Alpes).

This is just a selection of some topics to give some inspiration:

(MA=Master in Math/CS, CSE=Comput. Sc. and Engin.)

  • HPC tools:
    • Automated Application Performance Characteristics Extraction
    • Portable performance assessment for programs with flat performance profile, BA, MA, CSE
  • Projects targeting Weather (and climate) forecasting
    • Implementation and performance assessment of ML-SDC/PFASST in OpenIFS (collaboration with the European Center for Medium-Range Weather Forecast), CSE, MA
    • Efficient realization of fast Associated Legendre transformations on GPUs (collaboration with the European Center for Medium-Range Weather Forecast), CSE, MA
    • Fast exponential and implicit time integration, BA, MA, CSE
    • MPI parallelization for the SWEET research software, MA, CSE
    • Semi-Lagrangian methods with Parareal, CSE, MA
    • Non-interpolating Semi-Lagrangian Schemes, CSE, MA
    • Time-splitting methods for exponential integrators, CSE, MA
    • Machine learning for non-linear time integration, CSE, MA
    • Exponential integrators and higher-order Semi-Lagrangian methods

  • Ocean simulations:
    • Porting the NEMO ocean simulation framework to GPUs with a source-to-source compiler
    • Porting the Croco ocean simulation framework to GPUs with a source-to-source compiler
       
  • Health science project: Biological parameter optimization
    • Extending a domain-specific language with time integration methods
    • Performance assessment and improvements for different hardware backends (GPUs / FPGAs / CPUs)

If you're interested in any of these projects or if you search for projects in this area, please drop me an Email for further information

In-Situ/In-Transit Data Transformation Using Low-Power Processors

MA: Integrating GPU-Based In-Situ/In-Transit Tasks

Background:

The ADMIRE project [1] is a European-funded project to accelerate the processing of extremely large data sets by creating an active I/O stack in high performance computing (HPC) architectures. As part of the project, in-situ/in-transit [2] data transformations can decrease or avoid the need for I/O and thus the negative influence from the I/O subsystem. This technique can also motivate new types of data analysis in scientific simulations, including data mining and AI techniques. One key task is to enable the in-situ/in-transit process on different architectures (e.g. CPU, GPU, or DPU).


Context:

  • The Adaptable Input Output System 2 (ADIOS2) [3] is a framework for scientific data I/O to publish and subscribe to data when and where required. It allows in-situ/in-transit processing on CPU with different “engines” defined in runtime configurations. However, it currently does not support GPUs.


Goals:

  • Integrate GPU-based in-situ/in-transit support as a new “engine” in ADIOS2, using CUDA and potentially also OpenACC
  • Analyze the performance of the solution.
  • Present the solution with one real case study.


Prerequisites:

  • Good programming knowledge of C++
  • Knowledge of GPU-programming (CUDA and/or OpenACC)
  • Interest in large-scale parallel computing


Offers:

  • Integration into a large international EU project
  • Access to large-scale HPC systems at MPCDF
  • Supervision and support during the thesis
  • Opportunity to publish a research paper or present in a conference


Contact:

In case of interest, please contact Yi Ju (yi.ju@mpcdf.mpg.de) at the Max Planck Computing and Data Facility (Prof. Laure) and the Chair for Computer Architecture and Parallel Systems (Prof. Schulz).


References:

[1] “Adaptive multi-tier intelligent data manager for exascale (https://www.admire-eurohpc.eu/).”
[2] H. Childs et al., “A terminology for in situ visualization and analysis systems,” The International Journal of High Performance Computing Applications, vol. 34, no. 6, pp. 676–691, 2020.
[3] W. F. Godoy et al., “ADIOS 2: The adaptable input output system. a framework for high-performance data management,” SoftwareX, p. 9, 2020.