iOS设备上的YOLO模型优化与解码

在iOS设备上部署机器学习模型时，经常需要在模型性能和设备资源消耗之间找到平衡。为了在不牺牲模型性能的前提下节省内存，可以将模型的权重从32位精度降低到16位精度。值得注意的是，当模型在iOS设备的GPU或神经引擎上执行时，它总是使用16位浮点数运行。只有在CPU上运行时，32位精度才会有所不同。

为了实现这一目标，可以使用Python语言和coremltools库来转换模型。以下是转换模型的代码示例：


import os
import coremltools as ct
import numpy as np

model_converted = ct.models.MLModel(
    './models/yolov2-coco-9.mlmodel'
)

model_converted = ct.models.neural_network.quantization_utils.quantize_weights(
    model_converted, 
    nbits=16, 
    quantization_mode='linear'
)

model_converted.save(
    './models/yolov2-16.mlmodel'
)

接下来，需要构建YOLO解码器。有两种选择：向现有模型添加解码层，或者创建一个单独的解码器，然后使用流水线将两者连接起来。选择后者。

首先，创建一个新的NeuralNetworkBuilder实例，并映射新解码器模型的输入和输出：


from coremltools.models import datatypes

input_features = [ (spec.description.output[0].name, datatypes.Array(1, 425, 13, 13)) ]
output_features = [ ('all_scores', datatypes.Array(1, 845, 80)), ('all_boxes', datatypes.Array(1, 845, 4)) ]

builder = ct.models.neural_network.NeuralNetworkBuilder(
    input_features, 
    output_features, 
    disable_rank5_shape_mapping=True
)

然后，定义了计算所需的常量：


GRID_SIZE = 13
CELL_SIZE = 1 / GRID_SIZE 
BOXES_PER_CELL = 5
NUM_CLASSES = 80
ANCHORS_W = np.array([0.57273, 1.87446, 3.33843, 7.88282, 9.77052]).reshape(1, 1, 5)
ANCHORS_H = np.array([0.677385, 2.06253, 5.47434, 3.52778, 9.16828]).reshape(1, 1, 5)

CX = np.tile(np.arange(GRID_SIZE), GRID_SIZE).reshape(1, 1, GRID_SIZE**2, 1)
CY = np.tile(np.arange(GRID_SIZE), GRID_SIZE).reshape(1, GRID_SIZE, GRID_SIZE).transpose()
CY = CY.reshape(1, 1, GRID_SIZE**2, 1)

为了使用模型与Vision框架，需要将边界框坐标从图像像素范围[0-1]进行缩放。

现在可以添加常量到网络中：


builder.add_load_constant_nd(
    'CX', output_name='CX', constant_value=CX, shape=CX.shape)

builder.add_load_constant_nd(
    'CY', output_name='CY', constant_value=CY, shape=CY.shape)

builder.add_load_constant_nd(
    'ANCHORS_W', output_name='ANCHORS_W', constant_value=ANCHORS_W, shape=ANCHORS_W.shape)

builder.add_load_constant_nd(
    'ANCHORS_H', output_name='ANCHORS_H', constant_value=ANCHORS_H, shape=ANCHORS_H.shape)

接下来，准备添加层到Core ML模型中。在大多数情况下，这将是之前文章中代码的直接转换，尽可能使用相同的变量/节点名称。有时，Core ML的怪癖会迫使进行一些小的更改。

从之前（向量化）实现的前两个转换对应的层开始：


builder.add_transpose(
    'yolo_trans_node', 
    axes=(0, 2, 3, 1), 
    input_name='218', 
    output_name='yolo_transp'
)

builder.add_reshape_static(
    'yolo_reshap',
    input_name='yolo_transp',
    output_name='yolo_reshap',
    output_shape=(1, GRID_SIZE**2, BOXES_PER_CELL, NUM_CLASSES + 5)
)

当使用NeuralNetworkBuilder实例创建一个新层时，需要为节点指定一个唯一的名称及其output_name（在上面的第一个操作中分别是"yolo_trans_node"和"yolo_transp"）。input_name值必须对应于现有的output_name（在这种情况下是"218"，这是转换的YOLOv2模型的输出）。

为了提取编码的框和置信度值，需要分割输入数组：


builder.add_split_nd(
    'split_boxes_node', 
    input_name='yolo_reshap',
    output_names=['tx', 'ty', 'tw', 'th', 'tc', 'classes_raw'],    
    axis=3,
    split_sizes=[1, 1, 1, 1, 1, 80]
)

这个操作将raw_preds数组切片成tx、ty、tw、th、tc和classes_raw数组。

不幸的是，剩下的代码将更加冗长，因为需要为每个基本的算术运算创建一个单独的节点。这导致了一个情况，向量化解码器中的简单一行：


x = ((CX + sigmoid(tx)) * CELL_SIZE).reshape(-1)

变成了：


builder.add_reshape_static(
    'tx:1', input_name='tx', output_name='tx:1', output_shape=(1, 169, 5))

builder.add_activation(
    'tx:1_sigm', non_linearity='SIGMOID', input_name='tx:1', output_name='tx:1_sigm'
)

builder.add_add_broadcastable(
    'tx:1_add', input_names=['CX', 'tx:1_sigm'], output_name='tx:1_add'
)

builder.add_elementwise(
    'x', input_names=['tx:1_add'], output_name='x', mode='MULTIPLY', alpha=CELL_SIZE)

为了使代码更短、更易读，使用了显式值"169"而不是GRID_SIZE**2，以及"5"而不是BOXES_PER_CELL在输出形状参数中。同样，在其他地方，"80"而不是NUM_CLASSES字面量也是如此。当然，在适当且灵活的解决方案中，应该坚持使用字面量。

计算y的操作是相同的。然后有非常相似的代码来计算边界框宽度（w）：


builder.add_reshape_static(
    'tw:1', input_name='tw', output_name='tw:1', output_shape=(1, 169, 5))

builder.add_unary(
    'tw:1_exp', input_name='tw:1', output_name='tw:1_exp', mode='exp'
)

builder.add_multiply_broadcastable(
    'tw:1_mul', input_names=['tw:1_exp', 'ANCHORS_W'], output_name='tw:1_mul'
)

builder.add_elementwise(
    'w', input_names=['tw:1_mul'], output_name='w', mode='MULTIPLY', alpha=CELL_SIZE)

计算h的操作再次非常相似（除了使用ANCHORS_H而不是ANCHORS_W常量）。

最后，解码box_confidence和classes_confidence值：


builder.add_reshape_static(
    'tc:1', input_name='tc', output_name='tc:1', output_shape=(1, 169*5, 1))

builder.add_activation(
    'box_confidence', non_linearity='SIGMOID', input_name='tc:1', output_name='box_confidence'
)

builder.add_reshape_static(
    'classes_raw:1', input_name='classes_raw', output_name='classes_raw:1', output_shape=(1, 169*5, 80)
)

builder.add_softmax_nd(
    'classes_confidence', input_name='classes_raw:1', output_name='classes_confidence', axis=-1
)

在之前的文章中描述的YOLOv2预测解码中，为每个框返回了一个最可能的类别。Vision框架期望为每个框/类别组合返回置信度：


builder.add_multiply_broadcastable(
    'combined_classes_confidence', 
    input_names=[
        'box_confidence',
        'classes_confidence'
    ],
    output_name='combined_classes_confidence'
)

现在，已经拥有了所有需要的值。接下来，让为Vision框架格式化这些值，将其分成两个数组：一个包含所有边界框的坐标（每个框有四列），第二个包含为每个框/类别组合计算的置信度（每个框有80列）。

这不是一个困难的任务，但因为需要将每个转换作为单独的操作来处理，它再次导致代码冗长：


builder.add_reshape_static(
    'x:1', input_name='x', output_name='x:1', output_shape=(1, 169*5, 1))

builder.add_reshape_static(
    'y:1', input_name='y', output_name='y:1', output_shape=(1, 169*5, 1))

builder.add_reshape_static(
    'w:1', input_name='w', output_name='w:1', output_shape=(1, 169*5, 1))

builder.add_reshape_static(
    'h:1', input_name='h', output_name='h:1', output_shape=(1, 169*5, 1))

builder.add_stack(
    'all_boxes:0', 
    input_names=['x:1', 'y:1', 'w:1', 'h:1'], 
    output_name='all_boxes:0', 
    axis=2)

builder.add_reshape_static(
    'all_boxes', 
    input_name='all_boxes:0', 
    output_name='all_boxes',
    output_shape=(1, 169*5, 4))

builder.add_reshape_static(
    'all_scores', 
    input_name='combined_classes_confidence', 
    output_name='all_scores',
    output_shape=(1, 169*5, 80))


builder.set_output(
    output_names=['all_scores', 'all_boxes'],
    output_dims=[(845, 80), (845, 4)])

model_decoder = ct.models.MLModel(builder.spec)

model_decoder.save('./models/yolov2-decoder.mlmodel')

使用BITS服务进行文件传输

本文介绍了如何使用BITS服务进行文件或目录的传输，包括BITS服务的背景知识、Bitsup程序的使用方法以及如何通过命令行界面进行操作。

使用树莓派进行实时野生动物检测与驱赶

本文介绍了如何使用树莓派进行野生动物的实时检测，并采取非伤害性措施驱赶它们。

iOS设备上的YOLO模型优化与解码

使用BITS服务进行文件传输

使用树莓派进行实时野生动物检测与驱赶

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485

iOS设备上的YOLO模型优化与解码

使用BITS服务进行文件传输

使用树莓派进行实时野生动物检测与驱赶

沪ICP备2024098111号-1

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢 联系电话：17898875485

上海秋旦网络科技中心：上海市奉贤区金大公路8218号1幢联系电话：17898875485