Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA G Bosilca, A Bouteiller, A Danalis, M Faverge, A Haidar, T Herault, ... 2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011 | 242* | 2011 |

Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers A Haidar, S Tomov, J Dongarra, NJ Higham SC18: International Conference for High Performance Computing, Networking …, 2018 | 227 | 2018 |

Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA G Bosilca, A Bouteiller, A Danalis, M Faverge, A Haidar, T Herault, ... 2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011 | 192 | 2011 |

Performance, design, and autotuning of batched GEMM for GPUs A Abdelfattah, A Haidar, S Tomov, J Dongarra High Performance Computing: 31st International Conference, ISC High …, 2016 | 128 | 2016 |

Accelerating numerical dense linear algebra calculations with GPUs J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, S Tomov, ... Numerical computations with GPUs, 3-28, 2014 | 128 | 2014 |

Seismic wave modeling for seismic imaging J Virieux, S Operto, H Ben-Hadj-Ali, R Brossier, V Etienne, F Sourbier, ... The Leading Edge 28 (5), 538-544, 2009 | 120 | 2009 |

The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, S Tomov, ... SIAM review 60 (4), 808-865, 2018 | 83 | 2018 |

Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels A Haidar, H Ltaief, J Dongarra Proceedings of 2011 International Conference for High Performance Computing …, 2011 | 81 | 2011 |

Batched matrix computations on hardware accelerators based on GPUs A Haidar, T Dong, P Luszczek, S Tomov, J Dongarra The International Journal of High Performance Computing Applications 29 (2 …, 2015 | 75 | 2015 |

High-performance tensor contractions for GPUs A Abdelfattah, M Baboulin, V Dobrev, J Dongarra, C Earl, J Falcou, ... Procedia Computer Science 80, 108-118, 2016 | 74 | 2016 |

Investigating half precision arithmetic to accelerate dense linear system solvers A Haidar, P Wu, S Tomov, J Dongarra Proceedings of the 8th workshop on latest advances in scalable algorithms …, 2017 | 71 | 2017 |

High-performance matrix-matrix multiplications of very small matrices I Masliah, A Abdelfattah, A Haidar, S Tomov, M Baboulin, J Falcou, ... Euro-Par 2016: Parallel Processing: 22nd International Conference on …, 2016 | 69 | 2016 |

Parallel programming models for dense linear algebra on heterogeneous systems J Dongarra, M Abalenkovs, A Abdelfattah, M Gates, A Haidar, J Kurzak, ... Supercomputing frontiers and innovations 2 (4), 67-86, 2015 | 63 | 2015 |

LU factorization of small matrices: Accelerating batched DGETRF on the GPU T Dong, A Haidar, P Luszczek, JA Harris, S Tomov, J Dongarra 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 …, 2014 | 60 | 2014 |

Unified development for mixed multi-gpu and multi-coprocessor environments using a lightweight runtime environment A Haidar, C Cao, A Yarkhan, P Luszczek, S Tomov, K Kabir, J Dongarra 2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014 | 58* | 2014 |

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems A Haidar, H Bayraktar, S Tomov, J Dongarra, NJ Higham Proceedings of the Royal Society A 476 (2243), 20200110, 2020 | 54 | 2020 |

A framework for batched and GPU-resident factorization algorithms applied to block householder transformations A Haidar, TT Dong, S Tomov, P Luszczek, J Dongarra High Performance Computing: 30th International Conference, ISC High …, 2015 | 54 | 2015 |

An improved parallel singular value algorithm and its implementation for multicore hardware A Haidar, J Kurzak, P Luszczek Proceedings of the International Conference on High Performance Computing …, 2013 | 52 | 2013 |

PLASMA: Parallel linear algebra software for multicore using OpenMP J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, P Wu, I Yamazaki, ... ACM Transactions on Mathematical Software (TOMS) 45 (2), 1-35, 2019 | 51 | 2019 |

Investigating power capping toward energy‐efficient scientific applications A Haidar, H Jagode, P Vaccaro, A YarKhan, S Tomov, J Dongarra Concurrency and Computation: Practice and Experience 31 (6), e4485, 2019 | 51 | 2019 |