MiniCPM
- class grid.model.perception.vlm.minicpm.MiniCPM(*args, **kwargs)
MiniCPM v2.6 Visual Question Answering Model
This class implements a wrapper for MiniCPM v2.6 that answers questions based on both images and videos.
- Credits:
- License:
This code is licensed under the Apache 2.0 License. We have obtained official license from the company to offer this model on GRID.
- __init__()
Initializes the MiniCPM model and tokenizer.
Loads the model and tokenizer from the Hugging Face Hub, configured for efficient memory usage.
- Return type:
None
- encode_video(video_path)
Encode a video file into frames.
- Parameters:
video_path (str) -- Path to the video file to be processed.
- Returns:
A list of PIL Image objects representing the video frames.
- Return type:
List[Image.Image]
- preprocess_media(media_path, is_video=False)
Preprocess an image or video file.
- Parameters:
media_path (str) -- Path to the media file to be processed.
is_video (bool) -- If True, treat the media as a video.
- Returns:
A PIL Image object for images or a list of PIL Images for videos.
- Return type:
Union[Image.Image, List[Image.Image]]
- run(query, media, is_video=False)
Given media (image(s) or video) and an accompanying query, return the answer to the query.
- Parameters:
media (Union[Image.Image, List[Image.Image], .mp4]) -- The media to be processed.
is_video (bool) -- If True, treat the media as a video.
query (str)
- Returns:
Response to the query.
- Return type:
str
Example
>>> mini_cpm = MiniCPM() >>> img = mini_cpm.preprocess_media("path/to/image.jpg") >>> outputs = mini_cpm.run(query="What do you see?", media=img)