OWLSAM

class grid.model.perception.segmentation.owlsam.OWLSAM(*args, **kwargs)

OWLSAM model for object segmentation. Combined model from Segment-Anything and OwlV2.

This class implements a wrapper for the OWLSAM model, which detects and segments objects in RGB images based on text prompts.

Credits:

https://github.com/facebookresearch/segment-anything and https://huggingface.co/google/owlv2-base-patch16-ensemble

License:

This code is licensed under the Apache 2.0 License.

__init__(box_threshold=0.25, text_threshold=0.25, nms_threshold=0.8)

Initialize OWLSAM model.

Parameters:
  • box_threshold (float, optional) -- The threshold value for bounding box prediction. Defaults to 0.25.

  • text_threshold (float, optional) -- The threshold value for text prediction. Defaults to 0.25.

  • nms_threshold (float, optional) -- The threshold value for non-maximum suppression. Defaults to 0.8.

Return type:

None

segment_object(rgbimage, text_prompt)

Detect and segment objects from the RGB image where the target objects are specified by text_prompt.

Parameters:
  • rgbimage (np.ndarray) -- Target RGB image represented as a numpy array of shape (H, W, 3).

  • text_prompt (str) -- Text prompt specifying the objects to be detected and segmented.

Returns:

List of object masks (np.ndarray of shape (H, W, 1)): object pixels are 1, others 0. The list is empty if no object is detected.

Return type:

List[np.ndarray]

Example

>>> from grid.model.perception.segmentation.owlsam import OWLSAM
>>> owlsam = OWLSAM()
>>> res = owlsam.segment_object(img, "turbine")
>>> if len(res) > 0:
>>>    mask = res[0]
>>>    # mask of shape (H, W, 1)