LLaVA

class grid.model.perception.vlm.llava.LLaVA(*args, **kwargs)

LLaVA: Visual Question Answering Model

This class implements a wrapper for the LLaVA model, which answers questions about visual media (images) using the LLaVA framework.

Credits:

https://llava-vl.github.io/

License:

This code is licensed under the Apache 2.0 License.

__init__()
Return type:

None

run(image, query)

Given an image and a query regarding contents of the image, return answer to the query.

Parameters:
  • image (np.ndarray) -- the image we are interested in

  • query (str) -- task instruction

Returns:

response to the query of the image

Return type:

str

Example

>>> llava = LLaVA()
>>> outputs = llava.run(img, "What do you see?")