Shizhe Chen

Cited by

	All	Since 2019
Citations	2912	2691
h-index	25	25
i10-index	47	46

820

410

205

615

20162017201820192020202120222023202422 59 130 186 220 344 555 817 565

Public access

View all

38 articles

10 articles

available

not available

Based on funding mandates

Co-authors

Qin Jin中国人民大学信息学院Verified email at ruc.edu.cn
Ivan LaptevVisiting professor at MBZUAI, on leave from INRIAVerified email at inria.fr
Cordelia SchmidResearch director INRIA Verified email at inria.fr
Alex HauptmannCarnegie Mellon UniversityVerified email at cs.cmu.edu
Ruihua SongRenmin University of ChinaVerified email at ruc.edu.cn

Shizhe Chen

INRIA Paris

Verified email at inria.fr - Homepage

Computer Vision Vision-and-Language


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Fine-grained video-text retrieval with hierarchical graph reasoning S Chen, Y Zhao, Q Jin, Q Wu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020	334	2020
Say as you wish: Fine-grained control of image caption generation with abstract scene graphs S Chen, Q Jin, P Wang, Q Wu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020	240	2020
Speech emotion recognition with acoustic and lexical features Q Jin, C Li, S Chen, H Wu 2015 IEEE international conference on acoustics, speech and signal …, 2015	211	2015
History aware multimodal transformer for vision-and-language navigation S Chen, PL Guhur, C Schmid, I Laptev Advances in neural information processing systems 34, 5834-5847, 2021	181	2021
Multimodal multi-task learning for dimensional and continuous emotion recognition S Chen, Q Jin, J Zhao, S Wang Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, 19-26, 2017	165	2017
Multi-modal dimensional emotion recognition using recurrent neural networks S Chen, Q Jin Proceedings of the 5th International Workshop on Audio/Visual Emotion …, 2015	138	2015
WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021	121	2021
Airbert: In-domain pretraining for vision-and-language navigation PL Guhur, M Tapaswi, S Chen, I Laptev, C Schmid Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021	120	2021
Describing videos using multi-modal fusion Q Jin, J Chen, S Chen, Y Xiong, A Hauptmann Proceedings of the 24th ACM international conference on Multimedia, 1087-1091, 2016	117	2016
Think global, act local: Dual-scale graph transformer for vision-and-language navigation S Chen, PL Guhur, M Tapaswi, C Schmid, I Laptev Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	113	2022
Elaborative rehearsal for zero-shot action recognition S Chen, D Huang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021	93	2021
Multi-modal conditional attention fusion for dimensional emotion prediction S Chen, Q Jin Proceedings of the 24th ACM international conference on Multimedia, 571-575, 2016	79	2016
Instruction-driven history-aware policies for robotic manipulations PL Guhur, S Chen, RG Pinel, M Tapaswi, I Laptev, C Schmid Conference on Robot Learning, 175-187, 2023	78	2023
Video captioning with guidance of multimodal latent topics S Chen, J Chen, Q Jin, A Hauptmann Proceedings of the 25th ACM international conference on Multimedia, 1838-1846, 2017	73	2017
Sketch, ground, and refine: Top-down dense video captioning C Deng, S Chen, D Chen, Y He, Q Wu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	68	2021
Multi-modal multi-cultural dimensional continues emotion recognition in dyadic interactions J Zhao, R Li, S Chen, Q Jin Proceedings of the 2018 on audio/visual emotion challenge and workshop, 65-72, 2018	56	2018
Unpaired cross-lingual image caption generation with self-supervised rewards Y Song, S Chen, Y Zhao, Q Jin Proceedings of the 27th ACM international conference on multimedia, 784-792, 2019	43	2019
Generating Video Descriptions With Latent Topic Guidance S Chen, Q Jin, J Chen, A Hauptmann IEEE TRANSACTIONS ON MULTIMEDIA 21 (9), 2407-2418, 2019	40	2019
Towards diverse paragraph captioning for untrimmed videos Y Song, S Chen, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	39	2021
Few-shot action recognition with hierarchical matching and contrastive learning S Zheng, S Chen, Q Jin European Conference on Computer Vision, 297-313, 2022	36	2022

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors