STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training (CVPR 2025) Qiu, Haiyi, et al. "STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training." Proceedings of the Computer Vision and Pattern Recognition Conference. 2025. AbstractVideo-LLMs는 최근 basic video understanding(captioning, coarse-grained question answe..