VideoLLaVA
- class grid.model.perception.vlm.video_llava.VideoLLaVA(*args, **kwargs)
VideoLLaVA: Visual Question Answering Model
This class implements a wrapper for the Video-LLaVA model, which answers questions about visual media (videos) using the Video-LLaVA framework.
- Credits:
- License:
This code is licensed under the Apache 2.0 License.
- __init__()
- Return type:
None
- preprocess_video(video_path)
Preprocess a video file into frames.
- Parameters:
video_path (str) -- Path to the video file to be processed.
- Returns:
An array of processed video frames of shape (num_frames, height, width, 3).
- Return type:
np.ndarray
- run(query, image=None, video=None)
Give a media (image/video) and an accompanying query, return answer to the query. Please note that the video needs to be preprocessed using the preprocess_video method before passing it to this method.
- Parameters:
image (np.ndarray) -- the image we are interested in
video (np.ndarray) -- the video we are interested in
query (str) -- task instruction
- Returns:
response to the query of the image
- Return type:
str
Example
>>> from grid.model.perception.vlm.video_llava import VideoLLaVA >>> video_llava = VideoLLaVA() >>> video = video_llava.preprocess_video("path/to/video.mp4") >>> outputs = video_llava.run(query="What is happening in the video", video=video)