from grid.model.perception.vlm.video_llava import VideoLLaVA
car = AirGenCar()

# We will be capturing an image from the AirGen simulator 
# and run model inference on it.

img =  car.getImage("front_center", "rgb").data

model = VideoLLaVA(use_local = True)
result = model.run(rgbimage=img, prompt=<prompt>)
print(result)

The VideoLLaVA class provides a wrapper for the Video-LLaVA model, which answers questions about visual media (videos).

class VideoLLaVA()
use_local
boolean
default:
true

If True, inference call is run on the local VM, else offloaded onto GRID-Cortex. Defaults to True.

def run()
image
np.ndarray

The input RGB image of shape (M,N,3)(M,N,3).

video
str

The path to the input video.

prompt
str
required

The question to answer about the media.

Returns
str

The response to the prompt.

from grid.model.perception.vlm.video_llava import VideoLLaVA
car = AirGenCar()

# We will be capturing an image from the AirGen simulator 
# and run model inference on it.

img =  car.getImage("front_center", "rgb").data

model = VideoLLaVA(use_local = True)
result = model.run(rgbimage=img, prompt=<prompt>)
print(result)

This code is licensed under the Apache 2.0 License.

Was this page helpful?