'Multimodal' 태그의 글 목록

Multimodal 2

Question Aware Vision Transformer for Multimodal Reasoning

Question Aware Vision Transformer for Multimodal Reasoning (CVPR 2024) Ganz, Roy, et al. "Question aware vision transformer for multimodal reasoning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024. AbstractVision-Language models는 multimodal reasoning에서 눈에 띄는 발전을 가능하게 했다. 이러한 architecture는 보통 vision encoder, LLM, visual feature를 LLM's representation spac..

Paper 2025.09.24

MEERKAT: Audio-Visual Large Language Model for Grounding in Space and Time

MEERKAT: Audio-Visual Large Language Model for Grounding in Space and Time (ECCV 2024) Chowdhury, Sanjoy, et al. "Meerkat: Audio-visual large language model for grounding in space and time." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024. AbstractLLM(Large Language Model)의 뛰어난 능력을 활용해서 최근의 MLLM(Multimodal Large Language Model) 연구는 이를 visual, audio와 같은 다른 modalit..

Paper 2025.09.23

ynnnxxi's 개 빡센 하루 시작 ❤︎

공부 기록 Blog | 맨날 까먹는 거 공부합니다 ^..♡⃛

VideoQG, Video Question Answering, VideoQA, AVQA, Long video, Tensor #Deeplearning #PyTorch #Tensorflow, VQA, CVPR, TPAMI, Review, llm, Multimodal, MovieChat, Question Answering, commensense, Question-aware, post-hoc, vision transformer, paper, VideoQuestionGrounding,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Multimodal 2

티스토리툴바