TECHNOLOGY, ART, AND COMPASSION

Dr.Hao Sun (孙浩)

I am Hao Sun (孙浩), an innovative researcher specializing in Artificial Intelligence (AI), with expertise in multimodal learning, large language models (LLMs), vision-language-action (VLA) models, embodied AI, reinforcement learning, and affective computing. Passionate about advancing Artificial General Intelligence (AGI) and transformative technologies to push the boundaries of human knowledge and civilization. My work has been published in top-tier venues including ACL, ACM Multimedia, Information Fusion, IEEE Transactions on Affective Computing, and Pattern Recognition, accumulating 500+ citations alongside several patents.

Education & Experience:

06.2025 - Now: Ritsumeikan University (Osaka, Japan). Senior Researcher
- Invited by Yen-Wei Chen, Fellow of the Engineering Academy of Japan
- Oversaw the research part of LLM, Multimodality, VLA, and Embodied AI in the host laboratory
- Led a research team focused on AGI and VLA utilizing LLMs, reinforcement learning, and bionics
08.2023 - 08.2024: Ritsumeikan University (Osaka&Otsu, Japan). Visiting Scholar
- Invited by Yen-Wei Chen, Fellow of the Engineering Academy of Japan
- Funded by the Zhejiang University PhD Academic Star Program (awarded to the top 100 graduate students)
- Led a project on developing a unified multimodal and multitask framework with LLMs
- Led a project on enabling LLMs with multimodal processing capabilities through parameter-efficient fine-tuning
- Published findings at IEEE Transactions on Affective Computing, Pattern Recognition, etc
09.2020 - 06.2025: Zhejiang University (Hangzhou, China). Ph.D in Computer Science and Technology
- Awarded the Outstanding Graduate Honor, selected as one of the top 10% of graduates for academic achievement
09.2016 - 06.2020: Harbin Institute of Technology. B.E in Software Engineering
- Awarded the Outstanding Graduate Honor, granted to the top 8% of students

For more details on my research and publications, please visit my Google Scholar page or ORCID page.

Recent Publications, Patents & Software

Here are some of my recently published academic papers, covering topics such as multimodal learning, large language models, vision-language-action (VLA) agents, embodied AI, and affective computing. With 500+ citations to date, my full publication list (20+) is available at my Google Scholar page.

Papers as first or corresponding author:

MIRTH: Mutual-Information Reasoning with Temporal Hubs for Vision-Language-Action Agents. The 64th Annual Meeting of the Association for Computational Linguistics (ACL), 2026.
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning. Pattern Recognition, 2025. (IF: 8.5)
Multimodal Sentiment Analysis with Mutual Information-based Disentangled Representation Learning. IEEE Transactions on Affective Computing, Vol.16(3), pp.1606-1617, 2025. (IF: 13.9)
Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification. Information Fusion, Vol.91, pp.504-514, 2023. (IF: 18.1)
Tensorformer: A Tensor-Based Multimodal Transformer for Multimodal Sentiment Analysis and Depression Detection. IEEE Transactions on Affective Computing, Vol.14(4), pp.2776-2786, 2023. (IF: 13.9)
Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level. Sensors, Vol.21(14), pp.4764, 2021.
CubeMLP: An MLP-Based Model for Multimodal Sentiment Analysis and Depression Estimation. ACM Multimedia, pp.3722-3729, 2022. (Cited 140+)

Preprints in submission (first author):

Multimodal Infusion Tuning for Large Models. arXiv:2403.05060, 2024.
Robust Latent Representation Tuning for Image-text Classification. arXiv:2406.06048, 2024.
Modality-invariant and Specific Prompting for Multimodal Human Perception Understanding. arXiv:2311.10791, 2023.

Papers as co-first, second or third author:

EPIC: Efficient Prompt Interaction for Text-Image Classification. IEEE ICME, 2025. (Co-First Author)
CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning. IEEE ICASSP, 2026.
Improving scDiffusion with Sparsity-Biased Classifier-Free Guidance. IEEE Engineering in Medicine and Biology Society, 2026.
Dynamic Summary Generation for Interpretable Multimodal Depression Detection. IEEE ICASSP, 2026.
IRLSG: Invariant Representation Learning for Single-Domain Generalization in Medical Image Segmentation. IEEE ICASSP, 2024.
LGA: A Language Guide Adapter for Advancing the SAM Model's Capabilities in Medical Image Segmentation. Springer MICCAI, pp.610-620, 2024.
Enhanced Multimodal Depression Detection With Emotion Prompts. IEEE ICASSP, 2025.
DepressionLLM: Emotion- and Causality-aware Depression Detection with Foundation Models. Displays, Vol.92, pp.103304, 2025.
MCKD: Mutually Collaborative Knowledge Distillation for Federated Domain Adaptation and Generalization. IEEE ICASSP, 2023.
CoSTHR: A Heart Rate Estimating Network with Adaptive Color Space Transformation. IEEE Transactions on Instrumentation and Measurement, Vol.71, pp.1-10, 2022. (IF: 5.6)

Recent Patents:

Engineering Progress Determining Method and Device Based on Multi-Mode Time Sequence Information Fusion. App. No: US20250005475A1. Country: US. Publication Date: 2023-07. Third Inventor
Engineering Progress Determining Method and Device Based on Multi-Mode Time Sequence Information Fusion. Grant No: CN116502882B. Country: CN. Grant Date: 2023-10. Third Inventor
Rheumatoid Arthritis Activity Grading Device Based on Multimodal Data. Grant No: CN116797572B. Country: CN. Grant Date: 2025-09. Fourth Inventor
A Single-Domain Generalization Method for Medical Image Segmentation. Grant No: CN116596832B. Country: CN. Grant Date: 2025-07. Fifth Inventor.

Software Copyrights:

MIRTH Robot Control System V1.0. Registration No.: 2026SR0601586. Country: China. Authorization Date: May 2026.

For 30+ independent projects, please visit my Github Page.
From 2020 to 2021, I was responsible for publishing TensorFlow tutorials on IMOOC.

Invited Keynotes

Selected invited talks that share my research vision on embodied intelligence, multimodal learning, and AGI with international scientific communities.

Embodied Empathy: Empowering AI with Perception, Cognition, Action, and Empathy (04.2026, Suzhou and Beijing)
- International Science and Technology Innovation Talents' China Tour
- Invited by the International Talent Exchange Center of the Ministry of Science and Technology
Multimodal Contrastive Learning Enhancement Methods for Image Classification and Segmentation Scenarios (02.2024, Kusatsu, Japan)
- International Workshop on Computer Vision and Artificial Intelligence
- Invited by Yen-Wei Chen, Fellow of the Engineering Academy of Japan

Projects

I have been actively involved in several exciting research projects, contributing to advancements in areas such as mutimodal learning and real-time monitoring. Here are some of my recent participated projects.

2022 - 2025: Intelligent Integrated Analysis Platform Construction for Rheumatoid Arthritis (RA)
- Funded by National Key R&D Program Project, 2022YFC2504605, Ministry of Science and Technology, China
- Aimed to develop an AI-driven platform for integrated analysis to enhance diagnosis and treatment of RA
- Built a new approach that integrates multimodal clinical data and optimizes diagnosis accuracy by 10%
- Accountable for the development, implementation, and validation of multimodal methodologies
2022 - 2024: Preoperative Early Recurrence Detection and Prediction of HCC Based on Federated Learning
- Funded by Zhejiang Provincial Natural Science Foundation Key Project, LZ22F020012
- Aim to develop a federated learning solution for early preoperative HCC recurrence prediction with privacy
- Achieved +13% accuracy in recurrence prediction while ensuring data privacy through federated learning
- Responsible for project proposals, multimodal methodologies, and final validation
2022 - 2024: Research on Key Technologies for Smart Construction Site Management Platform Based on CV
- Funded by Hangzhou New Zhongda Technology Co., Ltd., 2022AIZD0147-02
- Aimed to develop a smart construction site monitor and management platform to improve safety and efficiency
- Successfully built a real-time monitoring platform that reduced violations by 15% and simplified management
- Responsible for project proposals, methodology, acceptance, and project management

Academic Service

I actively contribute to the research community as a guest editor and reviewer for leading journals and conferences in AI, multimodal learning, and affective computing.

Guest Editor (2026 - Now)
- Frontiers in Neuroinformatics
- Journal of Visualized Experiments
Reviewer (2022 - Now)
- Information Fusion, IEEE Transactions on Affective Computing, Pattern Recognition
- ACM Transactions on Multimedia Computing, Communications and Applications
- IEEE/CAA Journal of Automatica Sinica, Information Processing and Management, Neurocomputing, etc.

Honors & Awards

The following honors and awards recognize my academic excellence, research achievements, and leadership contributions throughout my graduate and professional journey.

Excellent Postgraduate Students' Award (06.2025, by Zhejiang University)
- Awarded to the top 10% of outstanding doctoral students in recognition of their academic excellence
Award of Honor for Graduate (Four times) (12.2024 / 12.2023 / 12.2022 / 12.2021, by Zhejiang University)
- Awarded annually to the top 15% of outstanding doctoral students in recognition of their excellence
Zhejiang University Academic Scholarship (12.2022, by Zhejiang University)
- Awarded to support research by outstanding doctoral students
Outstanding Graduate Leader Award (Twice) (12.2024 / 12.2023, by Zhejiang University)
- Recognizes exceptional graduate students who demonstrate outstanding leadership to their field or community
Graduate with Merit A Performance (12.2023, by Zhejiang University)
- Awarded to graduates demonstrated exceptional academic performance and active participation in social activities
HUAWEI Scholarship (12.2023, by Zhejiang University)
- Awarded to exceptional students in computer science and AI for academic excellence and research innovation
Outstanding Undergraduate Award (06.2020, by Harbin Institute of Technology)
- Awarded to the top 15% of outstanding undergraduate students in recognition of their excellence
National Aspirational Scholarship (12.2018, by Harbin Institute of Technology)
- Awarded to the top 5% of outstanding undergraduate students in recognition of their excellence

Skills

This section outlines my core academic and engineering skills, including research design, algorithm development, large-scale model training, and practical system implementation, with a strong focus on AI, large language models (LLMs), multimodal learning, and embodied AI.

Academic-Specific: Scholarly Writing, Publication, Peer Review, Conference Presentation, Grant Proposal
AI Research: Algorithm Development, Model Training & Finetuning, Data Processing, Evaluation
LLM-Specific: Model Customization, Knowledge Integration, Multimodal Tuning, Scalability and Efficiency
Embodied AI: Action Grounding, Platform Building, Hardware Employment
Multimodal Research: Framework Design, Task Adaptation, Multimodal System Deployment
Software Engineering: Feasibility and Requirements Analysis, System and Detailed Design, Implementation and Software Maintenance, etc
Programming Languages: Python, PyTorch, Numpy, TensorFlow, Java, C++, C, HTML, GO, etc
Full-stack Development: FrontEnd Programming, BackEnd Design and Implementation, Database Systems
Language: Mandarin, English, Japanese

Contact Me

Email
sunhaoxx@zju.edu.cn
sunhaoxx@foxmail.com
Social