Molmo
- class grid.model.perception.vlm.molmo.Molmo(*args, **kwargs)
Molmo: Visual Question Answering Model
This class implements a wrapper for the Molmo model, which generates descriptions about visual media (images) using the Molmo framework.
- Credits:
- License:
This code is licensed under the Apache 2.0 License.
- __init__()
- Return type:
None
- run(image, query)
Given an image and a query regarding contents of the image, return answer to the query.
- Parameters:
image (Image.Image) -- the image we are interested in
query (str) -- task instruction
- Returns:
response to the query of the image
- Return type:
str
Example
>>> molmo = Molmo() >>> outputs = molmo.run(img, "Describe this image.")