Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers A Haidar, S Tomov, J Dongarra, NJ Higham SC18: International Conference for High Performance Computing, Networking …, 2018 | 268 | 2018 |
Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA G Bosilca, A Bouteiller, A Danalis, M Faverge, A Haidar, T Herault, ... 2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011 | 254* | 2011 |
Performance, design, and autotuning of batched GEMM for GPUs A Abdelfattah, A Haidar, S Tomov, J Dongarra High Performance Computing: 31st International Conference, ISC High …, 2016 | 144 | 2016 |
Accelerating numerical dense linear algebra calculations with GPUs J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, S Tomov, ... Numerical computations with GPUs, 3-28, 2014 | 142 | 2014 |
Seismic wave modeling for seismic imaging J Virieux, S Operto, H Ben-Hadj-Ali, R Brossier, V Etienne, F Sourbier, ... The Leading Edge 28 (5), 538-544, 2009 | 130 | 2009 |
The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, S Tomov, ... SIAM review 60 (4), 808-865, 2018 | 111 | 2018 |
Investigating half precision arithmetic to accelerate dense linear system solvers A Haidar, P Wu, S Tomov, J Dongarra Proceedings of the 8th workshop on latest advances in scalable algorithms …, 2017 | 84 | 2017 |
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels A Haidar, H Ltaief, J Dongarra Proceedings of 2011 International Conference for High Performance Computing …, 2011 | 83 | 2011 |
High-performance tensor contractions for GPUs A Abdelfattah, M Baboulin, V Dobrev, J Dongarra, C Earl, J Falcou, ... Procedia Computer Science 80, 108-118, 2016 | 80 | 2016 |
RETRACTED: Batched matrix computations on hardware accelerators based on GPUs A Haidar, T Dong, P Luszczek, S Tomov, J Dongarra The International Journal of High Performance Computing Applications 29 (2 …, 2015 | 79 | 2015 |
PLASMA: Parallel linear algebra software for multicore using OpenMP J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, P Wu, I Yamazaki, ... ACM Transactions on Mathematical Software (TOMS) 45 (2), 1-35, 2019 | 71 | 2019 |
Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems A Haidar, H Bayraktar, S Tomov, J Dongarra, NJ Higham Proceedings of the Royal Society A 476 (2243), 20200110, 2020 | 69 | 2020 |
High-performance matrix-matrix multiplications of very small matrices I Masliah, A Abdelfattah, A Haidar, S Tomov, M Baboulin, J Falcou, ... Euro-Par 2016: Parallel Processing: 22nd International Conference on …, 2016 | 69 | 2016 |
heFFTe: Highly Efficient FFT for Exascale A Ayala, S Tomov, A Haidar, J Dongarra International Conference on Computational Science, 262-275, 2020 | 66 | 2020 |
LU factorization of small matrices: Accelerating batched DGETRF on the GPU T Dong, A Haidar, P Luszczek, JA Harris, S Tomov, J Dongarra 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 …, 2014 | 64 | 2014 |
Parallel programming models for dense linear algebra on heterogeneous systems J Dongarra, M Abalenkovs, A Abdelfattah, M Gates, A Haidar, J Kurzak, ... Supercomputing frontiers and innovations 2 (4), 67-86, 2015 | 63 | 2015 |
Unified development for mixed multi-gpu and multi-coprocessor environments using a lightweight runtime environment A Haidar, C Cao, A Yarkhan, P Luszczek, S Tomov, K Kabir, J Dongarra 2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014 | 60* | 2014 |
A framework for batched and GPU-resident factorization algorithms applied to block householder transformations A Haidar, TT Dong, S Tomov, P Luszczek, J Dongarra High Performance Computing: 30th International Conference, ISC High …, 2015 | 58 | 2015 |
Investigating power capping toward energy‐efficient scientific applications A Haidar, H Jagode, P Vaccaro, A YarKhan, S Tomov, J Dongarra Concurrency and Computation: Practice and Experience 31 (6), e4485, 2019 | 57 | 2019 |
The design of fast and energy-efficient linear solvers: On the potential of half-precision arithmetic and iterative refinement techniques A Haidar, A Abdelfattah, M Zounon, P Wu, S Pranesh, S Tomov, ... International conference on computational science, 586-600, 2018 | 55 | 2018 |