VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

A Chain-of-LoRA Agent for Long Video Reasoning

VideoMind is a multi-modal agent framework that enhances video reasoning by emulating human-like processes, such as breaking down tasks, localizing and verifying moments, and synthesizing answers. This demo showcases how VideoMind-2B handles video-language tasks. Please open an issue if you meet any problems.

Video

Roles

Select the role(s) you would like to activate.

🗺️ Planner 🔍 Grounder 📊 Verifier 📝 Answerer

Text Prompt

Examples

	Text Prompt	Roles

Pages:

VideoMind