LLaVANeXT
- class grid.model.perception.vlm.llava_next.LLaVANeXT(*args, **kwargs)
LLaVANeXT: Visual Question Answering Model
This class implements a wrapper for the LLaVANeXT model, which answers questions about visual media (images/videos) using the LLaVANeXT framework.
- Credits:
- License:
This code is licensed under the Apache 2.0 License.
- __init__()
- Return type:
None
- preprocess_video(video_path)
Preprocess a video file into frames.
- Parameters:
video_path (str) -- Path to the video file to be processed.
- Returns:
An array of processed video frames of shape (num_frames, height, width, 3).
- Return type:
np.ndarray
- run(image=None, video=None, query='Desribe the content in the image')
Give a media (image/video) and an accompanying query, return answer to the query.
- Parameters:
image (np.ndarray) -- the image we are interested in
video (np.ndarray) -- the video we are interested in
query (str) -- task instruction
- Returns:
response to the query of the image
- Return type:
str
Example
>>> llava_next = LLaVANeXT() >>> outputs = llava_next.run(img, "What do you see?")