Pixtral

class grid.model.perception.vlm.pixtral.Pixtral(*args, **kwargs)

Pixtral: Visual Description Model

This class implements a wrapper for the Pixtral model, which generates image captions based on input images.

Credits:

https://huggingface.co/mistralai/Pixtral-12B-2409

License:

This code is licensed under the Apache 2.0 License.

__init__()
Return type:

None

run(image_url, prompt)

Given an image URL and a prompt, generate a description of the image.

Parameters:
  • image_url (str) -- the URL of the image

  • prompt (str) -- task instruction

Returns:

response to the prompt

Return type:

str

Example

>>> pixtral = Pixtral()
>>> outputs = pixtral.run("https://picsum.photos/id/237/200/300", "Describe this image")