Yupan Huang
Cited by
Cited by
Seeing out of the box: End-to-end pre-training for vision-language representation learning
Z Huang*, Z Zeng*, Y Huang*, B Liu, D Fu, J Fu
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Y Huang, T Lv, L Cui, Y Lu, F Wei
Proceedings of the 30th ACM International Conference on Multimedia, 2022
Probing inter-modality: Visual parsing with self-attention for vision-and-language pre-training
H Xue, Y Huang, B Liu, H Peng, J Fu, H Li, J Luo
Advances in Neural Information Processing Systems 34, 4514-4528, 2021
Decoupling localization and classification in single shot temporal action detection
Y Huang, Q Dai, Y Lu
2019 IEEE International Conference on Multimedia and Expo (ICME), 1288-1293, 2019
Unifying multimodal transformer for bi-directional image and text generation
Y Huang, H Xue, B Liu, Y Lu
Proceedings of the 29th ACM International Conference on Multimedia, 1138-1147, 2021
Reinforced short-length hashing
X Liu, X Nie, Q Dai, Y Huang, L Lian, Y Yin
IEEE Transactions on Circuits and Systems for Video Technology 31 (9), 3655-3668, 2020
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Y Huang, Z Meng, F Liu, Y Su, N Collier, Y Lu
arXiv preprint arXiv:2308.16463, 2023
A picture is worth a thousand words: A unified system for diverse captions and rich images generation
Y Huang, B Liu, J Fu, Y Lu
Proceedings of the 29th ACM International Conference on Multimedia, 2792-2794, 2021
Be Specific, Be Clear: Bridging Machine and Human Captions by Scene-Guided Transformer
Y Huang, Z Zeng, Y Lu
Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia …, 2021
TextDiffuser: Diffusion Models as Text Painters
J Chen*, Y Huang*, T Lv, L Cui, Q Chen, F Wei
NeurIPS, 2023
Kosmos-2.5: A Multimodal Literate Model
T Lv*, Y Huang*, J Chen*, L Cui*, S Ma, Y Chang, S Huang, W Wang, ...
arXiv preprint arXiv:2309.11419, 2023
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei
arXiv preprint arXiv:2311.16465, 2023
The system can't perform the operation now. Try again later.
Articles 1–12