VideoMind is a multi-modal agent framework that enhances video reasoning by emulating human-like processes, such as breaking down tasks, localizing and verifying moments, and synthesizing answers. This demo showcases how VideoMind-2B handles video-language tasks. Please open an issue if you meet any problems.