视频处理中的OCR技术应用

在本文中，将探讨如何将光学字符识别（OCR）技术与计算机视觉工具相结合，以处理视频文件。通过这种方法，可以识别视频中的多模态运输容器的ID。例如，有一个包含多个运输容器的视频，每个容器上都有一个或多个标识号码。利用Roboflow Universe中的对象检测模型，可以通过OCR技术识别这些ID号码。

首先，需要设置一个回调函数，以便稍后在检测到的对象上运行OCR。在这个例子中，将使用EasyOCR。可以通过阅读相关博客文章了解其他OCR选项及其性能。以下是设置回调函数的代码示例：


import easyocr
reader = easyocr.Reader(['en'])

def run_ocr(frame,box):
  cropped_frame = sv.crop_image(frame,box)  

  result = reader.readtext(cropped_frame,detail=0)

  text = "".join(result)
  text = re.sub(r'[^0-9]', '', text)

  return text

在这个用例中，只想要构成ID的数字，因此替换掉检测到的文本中除了数字以外的所有内容。接下来，使用Supervision的`process_video`函数，在两步过程中先进行检测，然后在检测结果上运行OCR。然后，可以对帧进行注释，以创建带有注释文本的注释视频。


import supervision as sv

def video_callback(frame, i):
  detections = predict(frame)
  detections = detections[detections.class_id == 2]

  labels = []
  for detection in detections:
    detected_text = run_ocr(frame,detection[0])
    labels.append(detected_text)

  annotated_frame = frame.copy()
  annotated_frame = sv.BoundingBoxAnnotator().annotate(annotated_frame,detections)
  annotated_frame = sv.LabelAnnotator().annotate(annotated_frame,detections,labels)

  return annotated_frame

sv.process_video(VIDEO_PATH,"cargo_rawocr.mp4",video_callback)

通过上述步骤，得到了一个带有检测到的文本标签的注释视频。

在视频中应用追踪以识别唯一物品

尽管可以在每一帧上运行OCR，但这样做可能会不必要地低效和成本高昂，并且在生产用例中不是特别有用。在之前的基础上，将使用对象检测模型和OCR，然后通过自定义代码将对象追踪整合进来，以“链接”每个已识别的容器及其对应的ID，并运行几次OCR，而不是几百次。还可以利用视频中多个ID跨帧的优势来构建共识逻辑，确保准确性。

为了提高可读性，一些用于生成最终输出的代码没有包含在本文中。完整的代码可以在中找到。使用Supervision中的ByteTrack实现，可以在视频中出现的对象帧中追踪每个对象。还创建了一个辅助类，用于跟踪OCR识别结果，确保最终文本是在十次OCR尝试中被识别最频繁的那一个。


import supervision as sv
import cv2
import numpy as np

# 初始化ByteTrack
tracker = sv.ByteTrack()

# 共识监控工具
container_ids_tracker = Consensus()

# 跟踪检测到的容器ID
container_ids = {}

# 每个视频帧运行的回调函数
def video_callback(frame, i):
  detections = predict(frame)
  detections = tracker.update_with_detections(detections)

  relevant_detections = detections[(detections.class_id == 1) | (detections.class_id == 2)]
  container_detections = detections[detections.class_id==1]
  id_detections = detections[detections.class_id==2]

  for i_idx, id_detection in enumerate(id_detections):
      id_box = id_detection[0]
      for c_idx, container_detection in enumerate(container_detections):
          # 如果ID在容器内，运行OCR。
          if check_within(id_box, container_detection[0]):
              parent_container_id = container_detection[4]

              container_id_winner = container_ids_tracker.winner(parent_container_id)
              if container_id_winner: continue

              ocr_result = ocr(frame,id_box,id_detection[4])
              container_ids_tracker.add_candidate(parent_container_id,ocr_result)

  # 视频注释标签代码...  

  annotated_frame = frame.copy()
  annotated_frame = sv.BoundingBoxAnnotator().annotate(annotated_frame,relevant_detections)
  # 更多视频注释代码...

  return annotated_frame

sv.process_video(VIDEO_PATH,"cargo_processed.mp4",video_callback)

运行代码后，得到了一个最终的注释视频，以及哪些容器存在的文本数据，这对于码头管理用例可能很有帮助。

视频处理中的OCR技术应用

在视频中应用追踪以识别唯一物品

OCR技术在现代制造业中的应用

企业级云服务与本地部署选项

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485

视频处理中的OCR技术应用

在视频中应用追踪以识别唯一物品

OCR技术在现代制造业中的应用

企业级云服务与本地部署选项

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢 联系电话：17898875485

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485