LLaVANeXT

class grid.model.perception.vlm.llava_next.LLaVANeXT(*args, **kwargs)

LLaVANeXT: Visual Question Answering Model

This class implements a wrapper for the LLaVANeXT model, which answers questions about visual media (images/videos) using the LLaVANeXT framework.

Credits:

https://github.com/LLaVA-VL/LLaVA-NeXT

License:

This code is licensed under the Apache 2.0 License.

__init__()
Return type:

None

preprocess_video(video_path)

Preprocess a video file into frames.

Parameters:

video_path (str) -- Path to the video file to be processed.

Returns:

An array of processed video frames of shape (num_frames, height, width, 3).

Return type:

np.ndarray

run(image=None, video=None, query='Desribe the content in the image')

Give a media (image/video) and an accompanying query, return answer to the query.

Parameters:
  • image (np.ndarray) -- the image we are interested in

  • video (np.ndarray) -- the video we are interested in

  • query (str) -- task instruction

Returns:

response to the query of the image

Return type:

str

Example

>>> llava_next = LLaVANeXT()
>>> outputs = llava_next.run(img, "What do you see?")