OWLSAM
- class grid.model.perception.segmentation.owlsam.OWLSAM(*args, **kwargs)
OWLSAM model for object segmentation. Combined model from Segment-Anything and OwlV2.
This class implements a wrapper for the OWLSAM model, which detects and segments objects in RGB images based on text prompts.
- Credits:
https://github.com/facebookresearch/segment-anything and https://huggingface.co/google/owlv2-base-patch16-ensemble
- License:
This code is licensed under the Apache 2.0 License.
- __init__(box_threshold=0.25, text_threshold=0.25, nms_threshold=0.8)
Initialize OWLSAM model.
- Parameters:
box_threshold (float, optional) -- The threshold value for bounding box prediction. Defaults to 0.25.
text_threshold (float, optional) -- The threshold value for text prediction. Defaults to 0.25.
nms_threshold (float, optional) -- The threshold value for non-maximum suppression. Defaults to 0.8.
- Return type:
None
- segment_object(rgbimage, text_prompt)
Detect and segment objects from the RGB image where the target objects are specified by text_prompt.
- Parameters:
rgbimage (np.ndarray) -- Target RGB image represented as a numpy array of shape (H, W, 3).
text_prompt (str) -- Text prompt specifying the objects to be detected and segmented.
- Returns:
List of object masks (np.ndarray of shape (H, W, 1)): object pixels are 1, others 0. The list is empty if no object is detected.
- Return type:
List[np.ndarray]
Example
>>> from grid.model.perception.segmentation.owlsam import OWLSAM >>> owlsam = OWLSAM() >>> res = owlsam.segment_object(img, "turbine") >>> if len(res) > 0: >>> mask = res[0] >>> # mask of shape (H, W, 1)