MEERKAT: Audio-Visual Large Language Model for Grounding in Space and Time (ECCV 2024) Chowdhury, Sanjoy, et al. "Meerkat: Audio-visual large language model for grounding in space and time." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024. AbstractLLM(Large Language Model)의 뛰어난 능력을 활용해서 최근의 MLLM(Multimodal Large Language Model) 연구는 이를 visual, audio와 같은 다른 modalit..