GroundingDINO

class grid.model.perception.detection.gdino.GroundingDINO(*args, **kwargs)

GroundingDINO: Open-set Object Detection Model

This class implements a wrapper for the GroundingDINO model, which detects objects in RGB images based on text prompts.

Credits:

https://github.com/IDEA-Research/GroundingDINO

License:

This code is licensed under the Apache 2.0 License.

__init__(box_threshold=0.4, text_threshold=0.25)

Initialize the GroundingDINO model.

Parameters:
  • box_threshold (float, optional) -- The threshold value for bounding box confidence. Defaults to 0.4.

  • text_threshold (float, optional) -- The threshold value for text confidence. Defaults to 0.25.

Return type:

None

detect_object(rgbimage, text_prompt)

Detect objects, which are specified by the text_prompt, in the RGB image and return the bounding boxes and phrases.

Parameters:
  • rgbimage (np.ndarray) -- Target RGB image represented as a numpy array of shape (H, W, 3).

  • text_prompt (str) -- Text prompt specifying the objects to be detected. Multiple objects can be specified, separated by ..

Returns:

bounding boxes (np.ndarray): List of bounding boxes with 2D pixel coordinates with respect to the image in xyxy format. (N, 4). phrases (List[str]): List of object names corresponding to the boxes. (N)

Return type:

Tuple[np.ndarray, List[str]]

Example

>>> gdinomodel = GroundingDINO()
>>> boxes, phrases = gdinomodel.detect_object(img, "dog . cake")
>>> print(boxes, phrases)