MiniCPM

class grid.model.perception.vlm.minicpm.MiniCPM(*args, **kwargs)

MiniCPM v2.6 Visual Question Answering Model

This class implements a wrapper for MiniCPM v2.6 that answers questions based on both images and videos.

Credits:

https://huggingface.co/openbmb/MiniCPM-V-2_6

License:

This code is licensed under the Apache 2.0 License. We have obtained official license from the company to offer this model on GRID.

__init__()

Initializes the MiniCPM model and tokenizer.

Loads the model and tokenizer from the Hugging Face Hub, configured for efficient memory usage.

Return type:

None

encode_video(video_path)

Encode a video file into frames.

Parameters:

video_path (str) -- Path to the video file to be processed.

Returns:

A list of PIL Image objects representing the video frames.

Return type:

List[Image.Image]

preprocess_media(media_path, is_video=False)

Preprocess an image or video file.

Parameters:
  • media_path (str) -- Path to the media file to be processed.

  • is_video (bool) -- If True, treat the media as a video.

Returns:

A PIL Image object for images or a list of PIL Images for videos.

Return type:

Union[Image.Image, List[Image.Image]]

run(query, media, is_video=False)

Given media (image(s) or video) and an accompanying query, return the answer to the query.

Parameters:
  • media (Union[Image.Image, List[Image.Image], .mp4]) -- The media to be processed.

  • is_video (bool) -- If True, treat the media as a video.

  • query (str)

Returns:

Response to the query.

Return type:

str

Example

>>> mini_cpm = MiniCPM()
>>> img = mini_cpm.preprocess_media("path/to/image.jpg")
>>> outputs = mini_cpm.run(query="What do you see?", media=img)