GroundingDINO
- class grid.model.perception.detection.gdino.GroundingDINO(*args, **kwargs)
GroundingDINO: Open-set Object Detection Model
This class implements a wrapper for the GroundingDINO model, which detects objects in RGB images based on text prompts.
- Credits:
- License:
This code is licensed under the Apache 2.0 License.
- __init__(box_threshold=0.4, text_threshold=0.25)
Initialize the GroundingDINO model.
- Parameters:
box_threshold (float, optional) -- The threshold value for bounding box confidence. Defaults to 0.4.
text_threshold (float, optional) -- The threshold value for text confidence. Defaults to 0.25.
- Return type:
None
- detect_object(rgbimage, text_prompt)
Detect objects, which are specified by the text_prompt, in the RGB image and return the bounding boxes and phrases.
- Parameters:
rgbimage (np.ndarray) -- Target RGB image represented as a numpy array of shape (H, W, 3).
text_prompt (str) -- Text prompt specifying the objects to be detected. Multiple objects can be specified, separated by ..
- Returns:
bounding boxes (np.ndarray): List of bounding boxes with 2D pixel coordinates with respect to the image in xyxy format. (N, 4). phrases (List[str]): List of object names corresponding to the boxes. (N)
- Return type:
Tuple[np.ndarray, List[str]]
Example
>>> gdinomodel = GroundingDINO() >>> boxes, phrases = gdinomodel.detect_object(img, "dog . cake") >>> print(boxes, phrases)