Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1209 | 2023 |
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022 | 615* | 2022 |
SPLATT: Efficient and parallel sparse tensor-matrix multiplication S Smith, N Ravindran, ND Sidiropoulos, G Karypis 2015 IEEE International Parallel and Distributed Processing Symposium, 61-70, 2015 | 268 | 2015 |
Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning S Rajbhandari, O Ruwase, J Rasley, S Smith, Y He Proceedings of the international conference for high performance computing …, 2021 | 213 | 2021 |
FROSTT: The Formidable Repository of Open Sparse Tensors and Tools S Smith, JW Choi, J Li, R Vuduc, J Park, X Liu, G Karypis http://frostt.io/, 2017 | 156 | 2017 |
Tensor-Matrix Products with a Compressed Sparse Tensor S Smith, G Karypis 5th Workshop on Irregular applications: Architectures and Algorithms (IA^3), 2015 | 145 | 2015 |
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022 | 141 | 2022 |
Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations N Srivastava, H Jin, S Smith, H Rong, D Albonesi, Z Zhang 2020 IEEE International Symposium on High Performance Computer Architecture …, 2020 | 108 | 2020 |
A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization S Smith, G Karypis Parallel and Distributed Processing Symposium (IPDPS), 2016 IEEE International, 2016 | 105* | 2016 |
Bridging the gap between HPC and big data frameworks M Anderson, S Smith, N Sundaram, M Capotă, Z Zhao, S Dulloor, ... Proceedings of the VLDB Endowment 10 (8), 901-912, 2017 | 77 | 2017 |
Accelerating the tucker decomposition with compressed sparse tensors S Smith, G Karypis European Conference on Parallel Processing, 653-668, 2017 | 68 | 2017 |
Truss Decomposition on Shared-Memory Parallel Systems S Smith, X Liu, NK Ahmed, AS Tom, F Petrini, G Karypis IEEE High Performance Extreme Computing Conference (HPEC), 2017 | 58 | 2017 |
Streaming tensor factorization for infinite data sources S Smith, K Huang, ND Sidiropoulos, G Karypis Proceedings of the 2018 SIAM International Conference on Data Mining, 81-89, 2018 | 55 | 2018 |
Sparse tensor factorization on many-core processors with high-bandwidth memory S Smith, J Park, G Karypis 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017 | 43 | 2017 |
Big data frequent pattern mining DC Anastasiu, J Iverson, S Smith, G Karypis Frequent Pattern Mining, 225-259, 2014 | 43 | 2014 |
An Exploration of Optimization Algorithms for High Performance Tensor Completion S Smith, J Park, G Karypis Proceedings of the 2016 ACM/IEEE Conference on Supercomputing (SC '16), 2016 | 40 | 2016 |
Memory-efficient parallel computation of tensor and matrix products for big tensor decomposition N Ravindran, ND Sidiropoulos, S Smith, G Karypis 2014 48th Asilomar Conference on Signals, Systems and Computers, 581-585, 2014 | 38 | 2014 |
Blocking optimization techniques for sparse tensor computation J Choi, X Liu, S Smith, T Simon 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2018 | 37 | 2018 |
Exploring Optimizations on Shared-memory Platforms for Parallel Triangle Counting Algorithms AS Tom, N Sundaram, NK Ahmed, S Smith, S Eyerman, M Kodiyath, I Hur, ... IEEE High Performance Extreme Computing Conference (HPEC), 2017 | 33 | 2017 |
Constrained Tensor Factorization with Accelerated AO-ADMM S Smith, A Beri, G Karypis 46th International Conference on Parallel Processing (ICPP '17), 2017 | 32 | 2017 |