OWLv2

class grid.model.perception.detection.owlv2.OWLv2(*args, **kwargs)

OWLv2: Open-vocabulary Object Detection Model

This class implements a wrapper for the OWLv2 model, which detects objects in RGB images based on a text prompt.

Credits:

https://arxiv.org/abs/2306.09683

Code:

https://huggingface.co/google/owlv2-base-patch16-ensemble

License:

This code is licensed under the Apache 2.0 License.

__init__(box_threshold=0.2)

Initialize Owlv2 model.

Parameters:

box_threshold (float, optional) -- The threshold value for object detection. Defaults to 0.2.

Return type:

None

detect_object(rgbimage, text_prompt)

Detect objects, which are specified by the text_prompt, in the RGB image and return the bounding boxes, scores, and labels.

Parameters:
  • rgbimage (np.ndarray) -- Target RGB image represented as a numpy array of shape (H, W, 3).

  • text_prompt (str) -- Text prompt specifies the objects to be detected. There can be multiple objects in the prompt, and different objects are separated by ,.

Returns:

bounding boxes (np.ndarray): List of bounding boxes with 2D pixel coordinates with respect to the image in xyxy format. (N, 4). scores (np.ndarray): List of confidence scores of the detected bounding boxes. (N) labels (np.ndarray): List of labels corresponding to the detected objects. (N)

Return type:

Tuple[Optional[np.ndarray], Optional[np.ndarray], Optional[np.ndarray]]

Example

>>> owlv2_model = OWLv2()
>>> boxes, scores, labels = owlv2_model.detect_object(img, "fire, redline")
>>> print(boxes, scores, labels)