Publications

2026

ICML2026 (Spotlight)

# LLM # Reasoning # Benchmark

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Hyunwoo Ko, Amit Agarwal, Sunghee Ahn, Kyong-Ha Lee, Youngjae Yu

Arxiv

ACL2026

# NLP # Text Simplification # Multilingual

Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification

Jinhong Jeong, Junghun Park, Youngjae Yu

ACL2026

# Multimodal # Video # Egocentric

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

Junhyeok Kim*, Jaewoo Park*, Junhee Park, Sangeyl Lee, Jiwan Chung, Jisung Kim, Ji Hoon Joung, Youngjae Yu

Arxiv

ACL2026

# Theory of Mind # Video # Nonverbal

Mind the Motions: Benchmarking Theory‑of‑Mind in Everyday Body Language

Seungbeen Lee, Jinhong Jeong, Donghyun Kim, Yejin Son, Youngjae Yu

Arxiv

ACL2026

# LLM # Fairness # Humor

Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

Shubin Kim*, Yejin Son*, Junyeong Park, Keummin Ka, Seungbeen Lee, Jaeyoung Lee, Hyeju Jang, Alice Oh, Youngjae Yu

ACL2026

# MLLM # Benchmark # UI/UX

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

Jaehyun Jeon, Min Soo Kim, Janghan Yoon, Sumin Shim, Yejin Choi, Hanbin Kim, Dae Hyun Kim, Youngjae Yu

Arxiv

ACL2026 Findings

# LLM # Knowledge Tracing # Education

Tracing Mathematical Proficiency Through Problem-Solving Processes

Jungyang Park*, Suho Kang*, Jaewoo Park, Jae Hong Kim, Jaewoo Shin, Seonjoon Park, Youngjae Yu

Arxiv

ACL2026 Findings

# LLM # Unlearning # Privacy

DUSK: Do Not Unlearn Shared Knowledge

Wonje Jeung*, Sangyeon Yoon*, Hyesoo Hong, Soeun Kim, Seungju Han, Youngjae Yu, Albert No

Arxiv

ACL2026 Findings

# VLM # Benchmark # Multimodal

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Dasol Choi*, Guijin Son*, Hanwool Lee*, Minhyuk Kim, Hyunwoo Ko, TEABIN LIM, Eungyeol Ahn, Jungwhan Kim, Seunghyeok Hong, Youngsook Song

Arxiv

ACL2026 Findings

# LLM # Reasoning # CoT

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning

Minju Gwak, Guijin Son, Jaehyung Kim

Arxiv

# Vision-Language-Action # Evaluation Harness # Robotic

vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models

Suhwan Choi, Yunsung Lee, Yubeen Park, Chris Dongjoo Kim, Ranjay Krishna, Dieter Fox, Youngjae Yu

Arxiv

LREC2026

# LLM Evaluation # NLP # Benchmark

Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models

Hanwool Lee*, Dasol Choi*, Sooyong Kim, Ilgyun Jeong, Sangwon Baek, Guijin Son, Inseon Hwang, Naeun Lee, Seunghyeok Hong

Arxiv

ICLR2026

# Image Generation # Diffusion # Prompt Optimization

TIPO: Text to Image with Text Presampling for Prompt Optimization

Shih-Ying Yeh*, Sang-Hyun Park*, Giyeong Oh, Min Song, Youngjae Yu

Arxiv

ICLR2026

# EmbodiedAI # Multimodal # Video

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

Suwhan Choi*, Jaeyoon Jung*, Haebin Seong*, Minchan Kim, Minyeong Kim, Yongjun Cho, Yoonshik Kim, Yubeen Park, Youngjae Yu, Yunsung Lee

Arxiv

ICLR2026

# NLP # Multilingual # CoT

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought

Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Amit Agarwal, Hyunwoo Ko, Chanuk lim, Srikant Panda, Minhyuk Kim, Nikunj drolia, Dasol Choi, Kyong-Ha Lee, Youngjae Yu

Arxiv

ICLR2026

# Multimodal # MLLM

Teaching Metric Distance to Autoregressive Multimodal Foundational Models

Jiwan Chung, Saejin Kim, Yongrae Jo, Jaewoo Park, Dongjun Min, Youngjae Yu

Arxiv

AAAI2026 (Oral)

# Multimodal # AudioLLM

Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

Jinhong Jeong*, Sunghyun Lee*, Jaeyoung Lee, Seonah Han, Youngjae Yu

Arxiv

AAAI2026

# Multimodal # LLM # Benchmark

Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

Jaewoo Park*, Jungyang Park*, Dongju Jang, Jiwan Chung, Byungwoo Yoo, Jaewoo Shin, Seonjoon Park, Taehyeong Kim, Youngjae Yu

Arxiv

2025

Humanoids 2025 (Workshop)

# Robotics # Humanoid

Baymax in Reality: A Humanoid System for Non-Contact Health Monitoring and Empathetic Interaction

Junhyeong Park, Taemoon Jeong, Minseo Kwak, Jisoo Kim, Seungbeen Lee, Sungjoon Choi, Youngjae Yu

Humanoids 2025 (Workshop)

# Robotics # Humanoid

K-pop Demon Robots

Sungwoong Kim, Minseo Kim, Siyeol Kim, Hwasup Lim, Youngjae Yu

CIKM2025

# Cross-lingual # Embeddings

NMIXX: Domain-Adapted Neural Embeddings for Cross-Lingual eXploration of Finance

Hanwool Lee*, Sara Yu*, Yewon Hwang*, Jonghyun Choi, Heejae Ahn, Sungbum Jung, Youngjae Yu

Arxiv

Neurips2025

# Computer Vision

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

Giyeong Oh, Woohyun Cho, Siyeol Kim, Suhwan Choi, Youngjae Yu

Arxiv

Neurips2025

# LLM # DPO # Human Preference

KL Penalty Control via Perturbation for Direct Preference Optimization

Sangkyu Lee, Janghoon Han, Hosung Song, Stanley Jungkyu Choi, Honglak Lee, Youngjae Yu

Arxiv

Neurips2025

# Computer Vision

Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation

Jeongin Kim, Wonho Bae, YouLee Han, Giyeong Oh, Youngjae Yu, Danica J. Sutherland, Junhyug Noh

Arxiv

EMNLP2025

# Embodied AI # LLM # Safety

Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making

Yejin Son*, Minseo Kim*, Sungwoong Kim, Seungju Han, Jian Kim, Dongju Jang, Youngjae Yu, Chanyoung Park

Arxiv

EMNLP2025

# Multimodal # Agent # Reasoning

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu

Arxiv

EMNLP2025

# Multimodal # Document # Information Retrieval

Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation

Yejin Choi*, Jaewoo Park*, Janghan Yoon, Saejin Kim, Jaehyun Jeon, Youngjae Yu

Arxiv

EMNLP2025

# Multimodal # Audio # Video

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu

Arxiv

EMNLP2025 (Findings)

# Multimodal # Commonsense Reasoning # Abductive Reasoning

Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd

Yejin Son*, Saejin Kim*, Dongjun Min, Youngjae Yu

Arxiv

COLM2025

# Multimodal # Safety # Societal Implications

G1yphD3c0de: Towards Safer Language Models on Visually Perturbed Texts

Yejin Choi, Yejin Yeo, Yejin Son, Seungju Han, Youngjae Yu

COLM2025

# NLP # Fact Verification

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

Wooseok Seo*, Seungju Han*, Jaehun Jung, Benjamin Newman, Seungwon Lim, Seungbeen Lee, Ximing Lu, Yejin Choi, Youngjae Yu

Arxiv

COLM2025

# Multimodal # Video

HIPPO-VIDEO : Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting

Jeongeun Lee, Youngjae Yu, Dongha Lee

Arxiv

ICCV2025

# Video Generation # Distillation # Preference Learning

V.I.P.: Iterative Online Preference Distillation for Efficient Video Diffusion Models

Jisoo Kim, Wooseok Seo, Junwan Kim, Seungho Park, Sooyeon Park, Youngjae Yu

Arxiv

ICCV2025

# 3D # Human Motion # Generation

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

Jungbin Cho*, Junwan Kim*, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu

Arxiv

ICCV2025

# Multimodal # Ambiguity

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Heejeong Nam, Jinwoo Ahn, Keummin Ka, Jiwan Chung, Youngjae Yu

Arxiv

MICCAI2025

# Computer Vision # Scalp Diagnosis # Image Translation

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

Youngmin Kim*, Saejin Kim*, Hoyeon Moon, Youngjae Yu, Junhyug Noh

Arxiv

ACL2025

# Multimodal # Nonverbal Conversation # Video # 3D

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

Youngmin Kim*, Jiwan Chung*, Jisoo Kim, Sunghyun Lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, Youngjae Yu

Arxiv

ACL2025 (Oral)

# NLP # Personality # Reinforcement Learning

Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games

Seungwon Lim, Seungbeen Lee, Dongjun Min, Youngjae Yu

Arxiv

ACL2025

# Multimodal # MLLM

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Jiwan Chung, Janghan Yoon, Junhyeong Park, Sangeyl Lee, Joowon Yang, Sooyeon Park, Youngjae Yu

Arxiv

ACL2025

# NLP # LLM # Safety

Representation Bending for Large Language Model Safety

Ashkan Yousefpour*, Taeheon Kim*, Ryan S. Kwon, Seungbeen Lee, Wonje Jeung, Seungju Han, Harrison Ngan, Youngjae Yu, Jonghyun Choi

Arxiv

# Computer Vision # Video # Industrial Application

SlumpGuard: An AI-Powered Real-Time System for Automated Concrete Slump Prediction via Video Analysis

Youngmin Kim*, Giyeong Oh*, Kwangsoo Youm, Youngjae Yu

Arxiv

# Multimodal # Reasoning

Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation

Jiwan Chung*, Junhyeok Kim*, Siyeol Kim, Jaeyoung Lee, Minsoo Kim, Youngjae Yu

Arxiv

# multimodal # MLLM # AI for Science

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Guijin Son, Jiwoo Hong, Honglu Fan, Heejeong Nam, Hyunwoo Ko, Seungwon Lim, Jinyeop Song, Jinha Choi, Gonçalo Paulo, Youngjae Yu

Arxiv

# Multimodal # UI

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

Jaehyun Jeon, Minsoo Kim, Janghan Yoon, Sumin Shim, Yejin Choi, Hanbin Kim, Youngjae Yu

Arxiv

# NLP # Math # Education

Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

Jaewoo Park*, Jungyang Park*, Dongju Jang, Jiwan Chung, Byungwoo Yoo, Jaewoo Shin, Seonjoon Park, Taehyeong Kim, Youngjae Yu

Arxiv

# LLM # Watermark # Low-rank Adaptation

SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

Giyeong Oh, Saejin Kim, Woohyun Cho, Sangkyu Lee, Jiwan Chung, Dokyung Song, Youngjae Yu

Arxiv

ICRA2025

# Embodied AI # Robotics # Navigation

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Yubeen Park, Minseo Kim, Sungwoong Kim, Sungjae Lee, Hwiseong Park, Jiwan Chung, Youngjae Yu

Arxiv

NAACL2025 (Oral)

# Multimodal # LLM # Chart Generation

C^2 : Scalable Auto-Feedback for LLM-based Chart Generation

Woosung Koh*, Janghan Yoon*, Minhyung Lee, Youngjin Song, Jaegwan Cho, Jaehyun Kang, Taehyeon Kim, Seyoung Yun, Youngjae Yu, Bongshin Lee

Arxiv

NAACL2025 (Findings)

# NLP # Personality # Psychometrics

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Seungbeen Lee*, Seungwon Lim*, Seungju Han, Giyeong Oh, Jiwan Chung, Minju Kim, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

Arxiv

NAACL2025 (Findings)

# Multimodal # Egocentric # Dialogue System

EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild

Junhyeok Kim, Minsoo Kim, Jiwan Chung, Jungbin Cho, Jisoo Kim, Sungwoong Kim, Gyeongbo Sim, Youngjae Yu

Arxiv

AAAI2025

# 3D # Speech # Facial expression

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

Jisoo Kim*, Jungbin Cho*, Joonho Park, Soonmin Hwang, Da Eun Kim, Geon Kim, Youngjae Yu

Arxiv

AAAI2025

# Multimodal # Debiasing

MASS: Overcoming Language Bias in Image-Text Matching

Jiwan Chung, Seungwon Lim, Sangkyu Lee, Youngjae Yu

Arxiv

AAAI2025

# Multimodal # Video LLM # Preference

i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment

Daechul Ahn, Yura Choi, San Kim, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

Arxiv

Publications

2026

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

Mind the Motions: Benchmarking Theory‑of‑Mind in Everyday Body Language

Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

Tracing Mathematical Proficiency Through Problem-Solving Processes

DUSK: Do Not Unlearn Shared Knowledge

What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

Revisiting the Uniform Information Density Hypothesis in LLM Reasoning

vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models

Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models

TIPO: Text to Image with Text Presampling for Prompt Optimization

D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought

Teaching Metric Distance to Autoregressive Multimodal Foundational Models

Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

2025

Baymax in Reality: A Humanoid System for Non-Contact Health Monitoring and Empathetic Interaction

K-pop Demon Robots

NMIXX: Domain-Adapted Neural Embeddings for Cross-Lingual eXploration of Finance

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

KL Penalty Control via Perturbation for Direct Preference Optimization

Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation

Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation

MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation

Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd

G1yphD3c0de: Towards Safer Language Models on Visually Perturbed Texts

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

HIPPO-VIDEO : Simulating Watch Histories with Large Language Models for History-Driven Video Highlighting

V.I.P.: Iterative Online Preference Distillation for Efficient Video Diffusion Models

DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues

Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games

Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?

Representation Bending for Large Language Model Safety

SlumpGuard: An AI-Powered Real-Time System for Automated Concrete Slump Prediction via Video Analysis

Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

C^2 : Scalable Auto-Feedback for LLM-based Chart Generation

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

MASS: Overcoming Language Bias in Image-Text Matching

i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment

2024

How well do large language model-based chatbots perform in oral and maxillofacial radiology?

Towards Visual Text Design Transfer Across Languages

WILDTEAMING at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

WILDGUARD: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation

ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos

CACTUS: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents

Large Language Models are Students at Various Levels: Zero-shot Question Difficulty Estimation

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

Aligning Large Language Models by On-Policy Self-Judgment

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

HyperCLOVA X Technical Report

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models

Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation

2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms