RT_DETR

class grid.model.perception.detection.rt_detr.RT_DETR(*args, **kwargs)

RT_DETR: Object Detection Model

This class implements a wrapper for the RT_DETR model, which detects objects in images and videos using a real-time detection transformer.

Credits:

https://github.com/lyuwenyu/RT-DETR

License:

This code is licensed under the Apache 2.0 License.

__init__()

Initialize the RT_DETR model.

The model is loaded onto the GPU if available, otherwise it defaults to the CPU.

Return type:

None

annotate_image(input_image, detections, labels)

Annotates the input image with bounding boxes, masks, and labels.

Parameters:
  • input_image (np.ndarray) -- The input image to be annotated.

  • detections (sv.Detections) -- The detected objects.

  • labels (List[str]) -- The labels for the detected objects.

Returns:

The annotated image.

Return type:

np.ndarray

process_image(image, confidence_threshold)

Processes an image and performs object detection.

Parameters:
  • image (np.ndarray) -- The input image.

  • confidence_threshold (float) -- Confidence threshold for object detection.

Returns:

boxes (List[Tuple[int]]): List of bounding boxes. scores (List[float]): List of confidence scores. labels (List[int]): List of class labels.

Return type:

Tuple[List[Tuple[int]], List[float], List[int]]

process_video(video_path, confidence_threshold)

Processes a video and performs object detection on each frame.

Parameters:
  • video_path (str) -- The path to the video file.

  • confidence_threshold (float) -- Confidence threshold for object detection.

Returns:

boxes (List[Tuple[int]]): List of bounding boxes. scores (List[float]): List of confidence scores. labels (List[int]]): List of class labels.

Return type:

Tuple[List[Tuple[int]], List[float], List[int]]

query(image, confidence_threshold)
run(input, confidence_threshold)

Processes an image or video and performs object detection.

Parameters:
  • input (Union[np.ndarray, str]) -- The image array or path to the video file.

  • confidence_threshold (float) -- Confidence threshold for object detection.

Returns:

boxes (List[Tuple[int]]): List of bounding boxes. scores (List[float]): List of confidence scores. labels (List[int]): List of class labels.

Return type:

Tuple[List[Tuple[int]], List[float], List[int]]

Example

>>> object_detector = RT_DETR()
>>> video_path = "path/to/video.mp4"
>>> boxes, scores, labels = object_detector.run(video_path, 0.5)