A Chain-of-LoRA Agent for Long Video Reasoning

VideoMind is a multi-modal agent framework that enhances video reasoning by emulating human-like processes, such as breaking down tasks, localizing and verifying moments, and synthesizing answers. This demo showcases how VideoMind-2B handles video-language tasks. Please open an issue if you meet any problems.

Roles

Select the role(s) you would like to activate.

1 100
0 1
1 1024
Examples
Text Prompt Roles
Pages: