no code implementations • 27 Jun 2021 • Joseph Huber, Weile Wei, Giorgis Georgakoudis, Johannes Doerfert, Oscar Hernandez
This paper presents a methodology for using LLVM-based tools to tune the DCA++ (dynamical clusterapproximation) application that targets the new ARM A64FX processor.
1 code implementation • 7 Mar 2019 • Tianyi Zhang, Shahrzad Shirzad, Patrick Diehl, R. Tohid, Weile Wei, Hartmut Kaiser
Not only must users port their own codes, but often users rely on highly optimized libraries such as BLAS and LAPACK which use OpenMP for parallization.
Distributed, Parallel, and Cluster Computing