Molmo

class grid.model.perception.vlm.molmo.Molmo(*args, **kwargs)

Molmo: Visual Question Answering Model

This class implements a wrapper for the Molmo model, which generates descriptions about visual media (images) using the Molmo framework.

Credits:

https://github.com/allenai/molmo

License:

This code is licensed under the Apache 2.0 License.

__init__()
Return type:

None

run(image, query)

Given an image and a query regarding contents of the image, return answer to the query.

Parameters:
  • image (Image.Image) -- the image we are interested in

  • query (str) -- task instruction

Returns:

response to the query of the image

Return type:

str

Example

>>> molmo = Molmo()
>>> outputs = molmo.run(img, "Describe this image.")