I am a Distinguished Professor with Tenure and Doctoral Supervisor at the Institute of AI for Engineering, Tongji University. I received my Ph.D. in Computer Science from Columbia University.
Previously, I served as Principal Researcher and Principal Architect at Microsoft Research and Microsoft Search & AI Division (US Headquarters); Principal Applied Science Manager and Head of GenAI Group at Microsoft AI Asia.
My research pursues the long-term goal of building scalable, knowledge-rich AI systems that can perceive, reason, and act across multimodal, real-world environments. My current research centers on autonomous AI agent systems — the next paradigm for intelligent systems. Key directions include multi-agent evolution and collaboration, enabling teams of specialized agents to coordinate, adapt, and improve through interaction; agent memory and knowledge-base construction, designing mechanisms for agents to acquire, organize, retrieve, and forget knowledge dynamically; reliable and efficient agent reasoning that ensures agents plan and act robustly under real-world constraints; and agent-environment interaction, grounding agents in tools, APIs, and physical or digital environments for end-to-end task completion.
This direction builds naturally on my recent work on large-scale AI systems and LLM trustworthiness. I led the deployment of LLMs into Bing's web-scale recommendation system (200B+ pages), developing quality-aware ranking, LLM-based candidate generation, and user-preference analytics (KDD 2025, RecSys 2024). I have also investigated core capabilities that underpin reliable agents: benchmarking in-context forgetting (ICF-Bench, ICLR 2026), improving RAG robustness against spurious features (ACL 2026), evaluating long-form narrative consistency (ACL 2026), and proposing efficient reasoning methods including self-compression (ConPress, ICML 2026) and trajectory fusion (TrajFusion, ACL 2026).
These efforts are rooted in over a decade of research on multimodal content understanding and knowledge extraction. My earlier work at Columbia University and Microsoft Research established foundations for automatically constructing event-centric knowledge bases from text, images, and video — through visual pattern mining with deep networks (PatternNet, ICMR 2018 Best Paper Poster Award), cross-media event extraction and coreference resolution (ACM Multimedia, EMNLP, NAACL), multimodal emotion reasoning (MEmoR, ACM MM 2020), object detection (CVPR 2020), and scalable visual instance mining.
Reviewer and editorial board member for ACM MM, ICME, IJCAI, IEEE TMM, IEEE TCSVT, TPAMI, JVCI, JVIS, and other venues.
For a complete list, please visit my Google Scholar profile.