Researchers at Carnegie Mellon University have recently developed a new model that enables fast and accurate object detection in high-resolution 4K and 8K video footage using GPUs. Their attention pipeline method carries out a two-stage evaluation of every image or video frame under rough and refined resolution, limiting the total number of evaluations necessary.
In recent years, machine learning has attained remarkable results in computer vision tasks, including object detection. However, most object recognition models typically perform best on images with a relatively low resolution. As the resolution of recording devices is rapidly improving, there is a rising need for tools that can process high-resolution data.
“We were interested in finding and overcoming the limitations of current approaches,” Vít Růžička, one of the researchers who carried out the study told TechXplore. “While plenty of data sources record in high resolution, current state-of-the-art object detection models, such as YOLO, Faster RCNN, SSD, etc., work with images that have a relatively low resolution of approximately 608 × 608 px. Our main objective was to scale the object detection task to 4K-8K videos (up to 7680 × 4320 px) while maintaining high processing speed. We also wanted to understand if and by how much we can benefit from high resolution compared to using low-resolution images, in terms of accuracy of the models.”