GSAM2

class grid.model.perception.segmentation.gsam2.GSAM2(*args, **kwargs)

GSAM2: Grounded Segment Anything 2.1 Model

This class implements a wrapper for the GSAM2 model, which combines the power of Grounding DINO for text-based object detection with SAM2 for high-precision segmentation in RGB images.

Credits:

https://github.com/facebookresearch/segment-anything-2 https://github.com/IDEA-Research/GroundingDINO

License:

This code is licensed under the Apache 2.0 and BSD-3 License.

__init__(model_size='large', box_threshold=0.35, text_threshold=0.25, nms_threshold=0.8)

Initialize the GSAM2 model with Grounding DINO and SAM2 components.

Parameters:
  • model_size (str) -- The size of the SAM2 model to use. Options are "tiny", "small", "base_plus", and "large".

  • box_threshold (float) -- Confidence threshold for object boxes from Grounding DINO.

  • text_threshold (float) -- Confidence threshold for text-prompted object detection.

  • nms_threshold (float) -- Non-maximum suppression (NMS) threshold for filtering overlapping boxes.

Return type:

None

segment_object(rgbimage, text_prompt)

Segment objects in an RGB image based on a text prompt.

This function uses Grounding DINO to detect objects based on the text prompt, and SAM2 to segment the detected objects.

Parameters:
  • rgbimage (str) -- Path to the RGB image file.

  • text_prompt (str) -- Text prompt to guide object detection.

Returns:

List of segmentation masks for the detected objects.

Return type:

List[np.ndarray]