CVPR 4

Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning

Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning (CVPR 2025) Liu, Huabin, Filip Ilievski, and Cees GM Snoek. "Commonsense video question answering through video-grounded entailment tree reasoning." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025. . Vol. 39. No. 7. 2025.Abstract이 논문은 commonsense video question answering (VQA)를 위한 최..

Paper 2026.01.06

On the Faithfulness of Vision Transformer Explanations

On the Faithfulness of Vision Transformer Explanaitons (CVPR 2024) Wu, Junyi, et al. "On the faithfulness of vision transformer explanations." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024. AbstractVision Transformer를 해석하기 위해서 post-hoc explanations는 input pixels에 중요도 점수(salience scores)를 할당하여 사람이 이해할 수 있는 heatmap을 제공한다. 그러나 이러한 해석이 실제로 model's output의 t..

Paper 2025.10.14

Question Aware Vision Transformer for Multimodal Reasoning

Question Aware Vision Transformer for Multimodal Reasoning (CVPR 2024) Ganz, Roy, et al. "Question aware vision transformer for multimodal reasoning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024. AbstractVision-Language models는 multimodal reasoning에서 눈에 띄는 발전을 가능하게 했다. 이러한 architecture는 보통 vision encoder, LLM, visual feature를 LLM's representation spac..

Paper 2025.09.24

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training (CVPR 2025) Qiu, Haiyi, et al. "STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025. AbstractVideo-LLMs는 최근 basic video understanding(captioning, coarse-grained question answe..

Paper 2025.09.17