LLaVA
- class grid.model.perception.vlm.llava.LLaVA(*args, **kwargs)
LLaVA: Visual Question Answering Model
This class implements a wrapper for the LLaVA model, which answers questions about visual media (images) using the LLaVA framework.
- Credits:
- License:
This code is licensed under the Apache 2.0 License.
- __init__()
- Return type:
None
- run(image, query='Desribe the content in the image')
Given an image and a query regarding contents of the image, return answer to the query.
- Parameters:
image (np.ndarray) -- the image we are interested in
query (str) -- task instruction
- Returns:
response to the query of the image
- Return type:
str
Example
>>> llava = LLaVA() >>> outputs = llava.run(img, "What do you see?")