Have any question ? +44 2030 2627 92

ISSN: 2755-6190 | Open Access

Open Access Journal of Artificial Intelligence and Technology

Volume : 2 Issue : 1

VisionVerse: Dynamic Video Question Answering Through Retrieval Augmented Generation

Abhiram S Sajeev*, Adhya Sanil Joseph, Amal Madhav T, Surekha Mariam Varghese and Aby Abahai T

ABSTRACT

In the digital age, video content has become a distinguished form for information sharing, entertainment, and education. However, navigating and comprehending lengthy video content can be time-consuming and challenging for users. The project introduces an innovative solution that can sway stateof-the-art language models to transform extensive video content into concise text contents, making it more accessible and user- friendly. By leveraging Retrieval Augmented Generation(RAG), it accurately condenses videos into text form, ensuring that the core message and key details are retained. This process enhances the efficiency of content consumption by providing users with a quick, readable overview of the video’s contents. Furthermore, introducing an interactive chatbot that enables users to engage with the video content. Users can ask questions, seek clarifications, or ferret in deeper into specific aspects of the video. The chatbot is powered by a Large Language Model, which enables meaningful and context-aware interactions. The idea not only facilitates better understanding but also encourages active participation and knowledge retention. Also, benefits of interactive chatbot and video summarization technologies together, offering users a dynamic and engaging means to access and interact with video content. The system employs advanced video-totext summarization techniques to automatically extract the most relevant information from videos. This innovation has significant potential applications in education, research, and the digital content landscape, where the efficient dissemination of information is paramount.

JOURNAL INDEXING