Follow
Xiaotian Han
Xiaotian Han
OpenAI
Verified email at openai.com - Homepage
Title
Cited by
Cited by
Year
Exploring the reasoning abilities of multimodal large language models (mllms): A comprehensive survey on emerging trends in multimodal reasoning
Y Wang, W Chen, X Han, X Lin, H Zhao, Y Liu, B Zhai, J Yuan, Q You, ...
arXiv preprint arXiv:2401.06805, 2024
79*2024
Mmptrack: Large-scale densely annotated multi-camera multiple people tracking benchmark
X Han, Q You, C Wang, Z Zhang, P Chu, H Hu, J Wang, Z Liu
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2023
38*2023
Real-time micro-scale temperature imaging at low cost based on fluorescent intensity ratio
J Xiong, M Zhao, X Han, Z Cao, X Wei, Y Chen, C Duan, M Yin
Scientific Reports 7 (1), 41311, 2017
382017
Image scene graph generation (sgg) benchmark
X Han, J Yang, H Hu, L Zhang, J Gao, P Zhang
arXiv preprint arXiv:2107.12604, 2021
362021
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
X Han, Q You, Y Liu, W Chen, H Zheng, K Mrini, X Lin, Y Wang, B Zhai, ...
arXiv e-prints, arXiv: 2311.11567, 2023
16*2023
Vitar: Vision transformer with any resolution
Q Fan, Q You, X Han, Y Liu, Y Tao, H Huang, R He, H Yang
arXiv preprint arXiv:2403.18361, 2024
142024
InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model
H Liu, Q You, Y Wang, X Han, B Zhai, Y Liu, W Chen, Y Jian, Y Tao, ...
Findings of the Association for Computational Linguistics ACL 2024, 485-492, 2024
10*2024
Infimm-webmath-40b: Advancing multimodal pre-training for enhanced mathematical reasoning
X Han, Y Jian, X Hu, H Liu, Y Wang, Q Fan, Y Ai, H Huang, R He, Z Yang, ...
arXiv preprint arXiv:2409.12568, 2024
92024
Infimm-hd: A leap forward in high-resolution multimodal understanding
H Liu, Q You, X Han, Y Wang, B Zhai, Y Liu, Y Tao, H Huang, R He, ...
arXiv preprint arXiv:2403.01487, 2024
8*2024
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Y Ai, X Zhou, H Huang, X Han, Z Chen, Q You, H Yang
Advances in Neural Information Processing Systems 37, 55443-55469, 2025
5*2025
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Y Liu, P Li, Z Wei, C Xie, X Hu, X Xu, S Zhang, X Han, H Yang, F Wu
arXiv preprint arXiv:2501.04575, 2025
52025
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
H Liu, Q You, X Han, Y Liu, H Huang, R He, H Yang
Advances in Neural Information Processing Systems 37, 17696-17718, 2024
32024
COCO is “ALL” You Need for Visual Instruction Fine-tuning
X Han, Y Wang, B Zhai, Q You, H Yang
2024 IEEE International Conference on Multimedia and Expo (ICME), 1-5, 2024
12024
InfiR: Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
C Xie, S Cai, W Wang, P Li, Z Sang, K Yang, Y Zhang, Z Li, G Zhu, Z Liu, ...
arXiv preprint arXiv:2502.11573, 2025
2025
BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data
X Wang, Q Cui, Y Tao, Y Wang, Z Chai, X Han, B Liu, J Yuan, J Su, ...
arXiv preprint arXiv:2410.00773, 2024
2024
eRAM-V: From Interaction to Integration in Efficient Multimodal Large Language Models
H Liu, Y Jian, X Han, Q You, H Huang, R He
The system can't perform the operation now. Try again later.
Articles 1–16