Følg
Minchen Yu
Minchen Yu
The Chinese University of Hong Kong, Shenzhen
Verificeret mail på cuhk.edu.cn - Startside
Titel
Citeret af
Citeret af
År
{MArk}: Exploiting cloud services for {Cost-Effective},{SLO-Aware} machine learning inference serving
C Zhang, M Yu, W Wang, F Yan
2019 USENIX Annual Technical Conference (USENIX ATC 19), 1049-1062, 2019
2442019
Gillis: Serving large neural networks in serverless functions with automatic model partitioning
M Yu, Z Jiang, HC Ng, W Wang, R Chen, B Li
2021 IEEE 41st International Conference on Distributed Computing Systems …, 2021
482021
Continuum: A platform for cost-aware, low-latency continual learning
H Tian, M Yu, W Wang
Proceedings of the ACM Symposium on Cloud Computing, 26-40, 2018
352018
Following the data, not the function: Rethinking function orchestration in serverless computing
M Yu, T Cao, W Wang, R Chen
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
32*2023
Enabling cost-effective, slo-aware machine learning inference serving on public cloud
C Zhang, M Yu, W Wang, F Yan
IEEE Transactions on Cloud Computing 10 (3), 1765-1779, 2020
232020
{CrystalPerf}: Learning to Characterize the Performance of Dataflow Computation through Code Analysis
H Tian, M Yu, W Wang
2021 USENIX Annual Technical Conference (USENIX ATC 21), 253-267, 2021
32021
RepBun: Load-balanced, shuffle-free cluster caching for structured data
M Yu, Y Yu, Y Zheng, B Yang, W Wang
IEEE INFOCOM 2020-IEEE Conference on Computer Communications, 954-963, 2020
32020
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping
M Yu, A Wang, D Chen, H Yu, X Luo, Z Li, W Wang, R Chen, D Nie, ...
arXiv preprint arXiv:2306.03622, 2023
22023
CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
S Li, H Lu, T Wu, M Yu, Q Weng, X Chen, Y Shan, B Yuan, W Wang
arXiv preprint arXiv:2401.11240, 2024
12024
Systemet kan ikke foretage handlingen nu. Prøv igen senere.
Artikler 1–9