精华内容
下载资源
问答
  • mxnet复现SSD系列文章目录 一、数据集的导入. 二、SSD模型架构. 三、训练脚本的实现. 四、损失、评价函数. 五、预测结果. 文章目录mxnet复现SSD系列文章目录前言一、模型架构二、实现代码参考链接 前言 本项目是...

    mxnet复现SSD系列文章目录

    一、数据集的导入.
    二、SSD模型架构.
    三、训练脚本的实现.
    四、损失、评价函数.
    五、预测结果.



    前言

    本项目是按照pascal voc的格式读取数据集,数据集为kaggle官网提供的口罩检测数据集,地址:Face Mask Detection,模型架构参考自gluoncv ssd_300_vgg16_atrous_voc源码


    一、模型架构

    SSD(
      (features): VGG_atrous(
        (stages): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (3): Activation(relu)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (3): Activation(relu)
          )
          (2): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (3): Activation(relu)
            (4): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (5): Activation(relu)
          )
          (3): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (3): Activation(relu)
            (4): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (5): Activation(relu)
          )
          (4): HybridSequential(
            (0): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (3): Activation(relu)
            (4): Conv2D(None -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (5): Activation(relu)
          )
          (5): HybridSequential(
            (0): Conv2D(None -> 1024, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6))
            (1): Activation(relu)
            (2): Conv2D(None -> 1024, kernel_size=(1, 1), stride=(1, 1))
            (3): Activation(relu)
          )
        )
        (norm4): Normalize(
        
        )
        (extras): HybridSequential(
          (0): HybridSequential(
            (0): Conv2D(None -> 256, kernel_size=(1, 1), stride=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (3): Activation(relu)
          )
          (1): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (3): Activation(relu)
          )
          (2): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1))
            (3): Activation(relu)
          )
          (3): HybridSequential(
            (0): Conv2D(None -> 128, kernel_size=(1, 1), stride=(1, 1))
            (1): Activation(relu)
            (2): Conv2D(None -> 256, kernel_size=(3, 3), stride=(1, 1))
            (3): Activation(relu)
          )
        )
      )
      (bbox_predictor): HybridSequential(
        (0): Conv2D(None -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2D(None -> 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (2): Conv2D(None -> 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (3): Conv2D(None -> 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): Conv2D(None -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (5): Conv2D(None -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (cls_predictor): HybridSequential(
        (0): Conv2D(None -> 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2D(None -> 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (2): Conv2D(None -> 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (3): Conv2D(None -> 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): Conv2D(None -> 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (5): Conv2D(None -> 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
    

    二、实现代码

    import mxnet as mx
    from mxnet import nd, init
    from mxnet.gluon import nn
    
    vgg_spec = {
        16: ([2, 2, 3, 3, 3], [64, 128, 256, 512, 512])
    }
    
    extra_spec = {
        300: [((256, 1, 1, 0), (512, 3, 2, 1)),
              ((128, 1, 1, 0), (256, 3, 2, 1)),
              ((128, 1, 1, 0), (256, 3, 1, 0)),
              ((128, 1, 1, 0), (256, 3, 1, 0))]
    }
    
    layers, filters = vgg_spec[16]
    extras = extra_spec[300]
    
    
    class Normalize(nn.HybridBlock):
        """Normalize layer described in https://arxiv.org/abs/1512.02325.
    
        Parameters
        ----------
        n_channel : int
            Number of channels of input.
        initial : float
            Initial value for the rescaling factor.
        eps : float
            Small value to avoid division by zero.
    
        """
        def __init__(self, n_channel, initial=1, eps=1e-5):
            super(Normalize, self).__init__()
            self.eps = eps
            with self.name_scope():
                self.scale = self.params.get('normalize_scale', shape=(1, n_channel, 1, 1),
                                             init=mx.init.Constant(initial))
    
        def hybrid_forward(self, F, x, scale):
            x = F.L2Normalization(x, mode='channel', eps=self.eps)
            return F.broadcast_mul(x, scale)
    
    
    class VGG_atrous(nn.HybridBlock):
        def __init__(self):
            super(VGG_atrous, self).__init__()
    
            self.init = {
                'weight_initializer': init.Xavier(
                    rnd_type='gaussian', factor_type='out', magnitude=2),
                'bias_initializer': 'zeros'
            }
            with self.name_scope():
                init_scale = mx.nd.array([0.229, 0.224, 0.225]).reshape((1, 3, 1, 1)) * 255
                self.init_scale = self.params.get_constant('init_scale', init_scale)
                self.stages = nn.HybridSequential()
                for l, f in zip(layers, filters):
                    stage = nn.HybridSequential(prefix='')
                    with stage.name_scope():
                        for _ in range(l):
                            stage.add(nn.Conv2D(f, kernel_size=3, padding=1, **self.init))
                            stage.add(nn.Activation('relu'))
                    self.stages.add(stage)
    
                stage = nn.HybridSequential(prefix='dilated_')
                with stage.name_scope():
                    stage.add(nn.Conv2D(1024, kernel_size=3, padding=6, dilation=6, **self.init))
                    stage.add(nn.Activation('relu'))
                    stage.add(nn.Conv2D(1024, kernel_size=1, **self.init))
                    stage.add(nn.Activation('relu'))
    
                self.stages.add(stage)
                self.norm4 = Normalize(filters[3], 20)
    
                self.extras = nn.HybridSequential()
                for i, config in enumerate(extras):
                    extra = nn.HybridSequential(prefix='extra%d_'%(i))
                    with extra.name_scope():
                        for f, k, s, p in config:
                            extra.add(nn.Conv2D(f, k, s, p, **self.init))
                            extra.add(nn.Activation('relu'))
                    self.extras.add(extra)
    
        def hybrid_forward(self, F, x, init_scale):
            x = F.broadcast_mul(x, init_scale)
            assert len(self.stages) == 6
            outputs = []
            for stage in self.stages[:3]:
                x = stage(x)
                x = F.Pooling(x, pool_type='max', kernel=(2, 2), stride=(2, 2),
                              pooling_convention='full')
            x = self.stages[3](x)
            norm = self.norm4(x)
            outputs.append(norm)
            x = F.Pooling(x, pool_type='max', kernel=(2, 2), stride=(2, 2),
                          pooling_convention='full')
            x = self.stages[4](x)
            x = F.Pooling(x, pool_type='max', kernel=(3, 3), stride=(1, 1), pad=(1, 1),
                          pooling_convention='full')
            x = self.stages[5](x)
            outputs.append(x)
            for extra in self.extras:
                x = extra(x)
                outputs.append(x)
            return outputs
    
    
    class SSD(nn.HybridBlock):
        def __init__(self, num_classes):
            super(SSD, self).__init__()
    
            self.num_classes = num_classes
            self.sizes = [[.1, .141], [.2, .272], [.37, .447], [.54, .619], [.71, .79], [.88, .961]]
            self.ratios = [[1, 2, .5], [1, 2, .5, 3, 1. / 3], [1, 2, .5, 3, 1. / 3], [1, 2, .5, 3, 1. / 3], \
                           [1, 2, .5], [1, 2, .5]]
    
            self.features = VGG_atrous()
    
            self.bbox_predictor = nn.HybridSequential()
            self.cls_predictor = nn.HybridSequential()
    
            for s, r in zip(self.sizes, self.ratios):
                num_anchors = len(s) + len(r) - 1  # 生成的锚框数量
                self.bbox_predictor.add(nn.Conv2D(num_anchors * 4,
                         kernel_size=3, padding=1))
                self.cls_predictor.add(nn.Conv2D(num_anchors * (self.num_classes + 1),
                         kernel_size=3, padding=1))
    
        # 以(批量大小, 宽×高×通道数)的统一格式转换二维,方便后续连接
        def flatten_pred(self, pred):
            return pred.transpose((0, 2, 3, 1)).flatten()
    
        # 连接column轴
        def concat_preds(self, F, preds):
            return F.concat(*[self.flatten_pred(p) for p in preds], dim=1)
    
        def hybrid_forward(self, F, x):
            outputs = self.features(x)
            anchors, cls_preds, bbox_preds = [None] * 6, [None] * 6, [None] * 6
            for i, x in enumerate(outputs):
                cls_preds[i] = self.cls_predictor[i](x)
                bbox_preds[i] = self.bbox_predictor[i](x)
                anchors[i] = F.contrib.MultiBoxPrior(x, sizes=self.sizes[i], ratios=self.ratios[i])
    
            bbox_preds = self.concat_preds(F, bbox_preds)
            cls_preds = self.concat_preds(F, cls_preds).reshape((0, -1, self.num_classes + 1))
            anchors = F.concat(*anchors, dim=1)
    
            return anchors, bbox_preds, cls_preds
    
    
    def get_model(num_classes, pretrained_model=None, pretrained=False, pretrained_base=False, ctx=mx.gpu()):
        net = SSD(num_classes)
        if pretrained_base:
            net.initialize(init=init.Xavier(), ctx=ctx)
            pretrained_base_model = 'model/vgg16_atrous_300.params'
            net.features.load_parameters(pretrained_base_model, allow_missing=True)
        elif pretrained:
            net.load_parameters(pretrained_model, ctx=ctx)
        return net
    
    

    其中的预训练模型为gluoncv官方提供的模型

    大致分为特征提取层和边界框预测,类别预测层。

    参考链接

    https://zh.d2l.ai/chapter_computer-vision/ssd.html
    https://gluon-cv.mxnet.io/model_zoo/detection.html#ssd

    展开全文
  • mxnet复现SSD系列文章目录 一、数据集的导入. 二、SSD模型架构. 三、训练脚本的实现. 四、损失、评价函数. 五、预测结果. 文章目录mxnet复现SSD系列文章目录前言一、读取单张图片进行预测代码实现结果展示二、实时...

    mxnet复现SSD系列文章目录

    一、数据集的导入.
    二、SSD模型架构.
    三、训练脚本的实现.
    四、损失、评价函数.
    五、预测结果.



    前言

    本项目是按照pascal voc的格式读取数据集,数据集为kaggle官网提供的口罩检测数据集,地址:Face Mask Detection,模型架构参考自gluoncv ssd_300_vgg16_atrous_voc源码


    一、读取单张图片进行预测

    代码实现

    import os
    import argparse
    import matplotlib.pyplot as plt
    import mxnet as mx
    from mxnet import image, nd
    from tools.tools import try_gpu, import_module
    
    
    # 读取单张测试图片
    def single_image_data_loader(filename, test_image_size=300):
        """
        加载测试用的图片,测试数据没有groundtruth标签
        """
    
        def reader():
            img_size = test_image_size
            file_path = os.path.join(filename)
            img = image.imread(file_path)
            img = image.imresize(img, img_size, img_size, 3).astype('float32')
    
            mean = [0.485, 0.456, 0.406]
            std = [0.229, 0.224, 0.225]
            mean = nd.array(mean).reshape((1, 1, -1))
            std = nd.array(std).reshape((1, 1, -1))
            out_img = (img / 255.0 - mean) / std
            out_img = out_img.transpose((2, 0, 1)).expand_dims(axis=0)    # 通道 h w c->c h w
    
            yield out_img
        return reader
    
    
    # 预测目标
    def predict(test_image, net, img, labels, threshold=0.3):
        anchors,bbox_preds,cls_preds= net(test_image)
        cls_probs = nd.SoftmaxActivation(cls_preds.transpose((0, 2, 1)), mode='channel')
        output = nd.contrib.MultiBoxDetection(cls_probs, bbox_preds, anchors,
                                              force_suppress=True, clip=True,
                                              threshold=0.5, nms_threshold=.45)
    
        idx = [i for i, row in enumerate(output[0]) if row[0].asscalar() != -1]
        if idx:
            output = output[0, idx]
            display(img, labels, output, threshold=threshold)
            return True
        else:
            return False
    
    
    # 显示多个边界框
    def show_bboxes(axes, bboxes, labels=None):
        for i, bbox in enumerate(bboxes, 0):
            bbox = bbox.asnumpy()
            rect = plt.Rectangle(
                xy=(bbox[0], bbox[1]), width=bbox[2]-bbox[0], height=bbox[3]-bbox[1],
                fill=False, linewidth=2, color='w')
            axes.add_patch(rect)
            if labels:
                axes.text(rect.xy[0], rect.xy[1], labels,
                          horizontalalignment='center', verticalalignment='center', fontsize=8,
                          color='k', bbox=dict(facecolor='w', alpha=1))
    
    
    def display(img, labels, output, threshold):
        fig = plt.imshow(img.asnumpy())
        for row in output:
            score = row[1].asscalar()
            if score < threshold:
                continue
            h, w = img.shape[0:2]
            bbox = [row[2:6] * nd.array((w, h, w, h), ctx=row.context)]
            label = labels[int(row[0].asscalar())]
            show_bboxes(fig.axes, bbox, '%s-%.2f' % (label, score))
        plt.show()
    
    
    def parse_args():
        parser = argparse.ArgumentParser(description='predict the single image')
        parser.add_argument('--image-path', dest='img_path', help='image path',
                            default=None, type=str)
        parser.add_argument('--model', dest='model', help='choice model to use',
                            default='resnet_ssd', type=str)
        parser.add_argument('--model-params', dest='model_params', help='choice model params to use',
                            default='mask_resnet18_SSD_model.params', type=str)
        parser.add_argument('--class-names', dest='class_names', help='choice class to use',
                            default='without_mask,with_mask,mask_weared_incorrect', type=str)
        parser.add_argument('--image-shape', dest='image_shape', help='image shape',
                            default=512, type=int)
        args = parser.parse_args()
        return args
    
    
    if __name__ == '__main__':
        args = parse_args()
        # ctx = try_gpu()
        ctx = mx.cpu()
    
        img = image.imread(args.img_path).as_in_context(ctx)
        reader = single_image_data_loader(args.img_path, args.image_shape)
        labels = args.class_names.strip().split(',')
        class_nums = len(labels)
    
        model_path = os.path.join('model', args.model_params)
        net = import_module('model.'+args.model).get_model(class_nums, pretrained_model=model_path, pretrained=True, ctx=ctx)
    
        for x in reader():
            output = predict(x, net, img, labels)
            if not output:
                print('not found!')
    
    

    结果展示

    在这里插入图片描述

    二、实时检测

    1.代码实现

    与单张图片检测不同的是,读取摄像头进行实时检测,需要创建两个线程,分别为读取图像和处理图像。只用一个线程会非常的卡,无法达到实时检测的目的。

    import cv2
    import time
    import threading
    from collections import deque
    import mxnet as mx
    from tools.tools import try_gpu, import_module
    lock = threading.Lock()
    
    
    def img_transform(img, img_size=500):
        img = mx.image.imresize(img, img_size, img_size, 3).astype('float32')
        orig_img = img.asnumpy().astype('uint8')
        mean = [0.485, 0.456, 0.406]
        std = [0.229, 0.224, 0.225]
        mean = mx.nd.array(mean).reshape((1, 1, -1))
        std = mx.nd.array(std).reshape((1, 1, -1))
        out_img = (img / 255.0 - mean) / std
        out_img = out_img.transpose((2, 0, 1)).expand_dims(axis=0)  # 通道 h w c->c h w
    
        return out_img, orig_img
    
    
    # 预测目标
    def predict(test_image, net):
        anchors, bbox_preds, cls_preds = net(test_image)
        cls_probs = mx.nd.SoftmaxActivation(cls_preds.transpose((0, 2, 1)), mode='channel')
        output = mx.nd.contrib.MultiBoxDetection(cls_probs, bbox_preds, anchors,
                                                 force_suppress=True, clip=True,
                                                 threshold=0.5, nms_threshold=.45)
    
        idx = [i for i, row in enumerate(output[0]) if row[0].asscalar() != -1]
        if idx:
            return output[0, idx]
    
    
    # 摄像头的显示
    class WebcamThread(threading.Thread):
        def __init__(self, input, output, img_height, img_width, threshold=0.5, labels=None):
            super(WebcamThread).__init__()
            self._jobq = input
            self._output = output
            self._num = 0
            self.cap = cv2.VideoCapture(0)
            self.img_height = img_height
            self.img_width = img_width
            self.labels = labels
            self.threshold = threshold
            threading.Thread.__init__(self)
    
        def run(self):
            start = time.time()
            cv2.namedWindow('camera', flags=cv2.WINDOW_NORMAL | cv2.WINDOW_FREERATIO)
            if not self.cap.isOpened():
                print('摄像头打开失败')
            while self.cap.isOpened():
                # 计算fps
                if self._num < 60:
                    self._num += 1
                else:
                    end = time.time()
                    fps = self._num / (end - start)
    
                    start = time.time()
                    self._num = 0
                    print('fps:', fps)
    
                ret, frame = self.cap.read()
                lock.acquire()
                if len(self._jobq) == 10:
                    self._jobq.popleft()
                else:
                    self._jobq.append(frame)
                lock.release()
                frame = cv2.resize(frame, (self.img_width, self.img_height))
                if self._output[0] is not None:
                    output = self._output[0]
                    for row in output:
                        score = row[1].asscalar()
                        if score < self.threshold:
                            cv2.imshow('camera', frame)
                            continue
                        bounding_boxes = [
                            row[2:6] * mx.nd.array((self.img_width, self.img_height, self.img_width, self.img_height),
                                                   ctx=row.context)]
                        for bbox in bounding_boxes:
                            bbox = bbox.asnumpy()
                            cv2.rectangle(frame, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (205, 0, 0), 2)
                            if self.labels:
                                label = self.labels[int(row[0].asscalar())]
                                font = cv2.FONT_HERSHEY_TRIPLEX
                                cv2.putText(frame, label, (bbox[0], bbox[1]), font, 0.8, (205, 0, 0), 1, cv2.LINE_8)
                            cv2.imshow('camera', frame)
                else:
                    cv2.imshow('camera', frame)
                if cv2.waitKey(1) == ord('q'):
                    # 退出程序
                    break
            print("实时读取线程退出!!!!")
            cv2.destroyWindow('camera')
            self._jobq.clear()  # 读取进程结束时清空队列
            self.cap.release()
    
    
    # 处理摄像头传来的数据
    class ModelDealhread(threading.Thread):
        def __init__(self, input, output, img_size, ctx=mx.cpu()):
            super(ModelDealhread).__init__()
            self._jobq = input
            self._output = output
            self.img_size = img_size
            self.ctx = ctx
            threading.Thread.__init__(self)
    
        def run(self):
            flag = False
            while True:
                if len(self._jobq) != 0:
                    lock.acquire()
                    im_new = self._jobq.pop()
                    lock.release()
    
                    frame = mx.nd.array(cv2.cvtColor(im_new, cv2.COLOR_BGR2RGB)).astype('uint8')
                    img, frame = img_transform(frame, img_size=self.img_size)
                    output = predict(img.as_in_context(self.ctx), net)
    
                    lock.acquire()
                    self._output[0] = output
                    lock.release()
                    # cv2.waitKey(500)
                    flag = True
                elif flag is True and len(self._jobq) == 0:
                    break
    
            print("间隔1s获取图像线程退出!!!!")
    
    
    if __name__ == "__main__":
        ctx = try_gpu()
        # net = vgg_ssd.get_model(3, pretrained_model='model/mask_SSD_model.params', pretrained=True,ctx=ctx)
        net = import_module('model.resnet_ssd').get_model(3, pretrained_model='model/mask_resnet18_SSD_model.params',
                                                          pretrained=True, ctx=ctx)
        net.hybridize()
    
        q = deque([], 10)   # 双端队列,存储当前帧
        output_q = [None]   # 模型的输出
        labels = ['without_mask', 'with_mask', 'mask_weared_incorrect']
        th1 = WebcamThread(q, output_q, 500, 500, labels=labels)
        th2 = ModelDealhread(q, output_q, 500, ctx=ctx)
    
        # 开启两个线程
        th1.start()
        th2.start()
    
        th1.join()
        th2.join()
    
    
    展开全文
  • mxnet复现SSD系列文章目录 一、数据集的导入. 二、SSD模型架构. 三、训练脚本的实现. 四、损失、评价函数. 五、预测结果. 文章目录mxnet复现SSD系列文章目录前言一、代码实现 前言 本项目是按照pascal voc的格式...

    mxnet复现SSD系列文章目录

    一、数据集的导入.
    二、SSD模型架构.
    三、训练脚本的实现.
    四、损失、评价函数.
    五、预测结果.



    前言

    本项目是按照pascal voc的格式读取数据集,数据集为kaggle官网提供的口罩检测数据集,地址:Face Mask Detection,模型架构参考自gluoncv ssd_300_vgg16_atrous_voc源码


    一、代码实现

    import mxnet as mx
    from mxnet import autograd, contrib, gluon, nd
    from train.smooth_l1 import smooth_l1, FocalLoss
    from utils import utils
    from tools.draw_details import draw_details
    from model.vgg_ssd import get_model
    
    import os
    import time
    import logging
    
    
    # 主训练函数
    def train(data_path, num_classes, data_size, batch_size, epochs, wd, momentum, lr, save_model_path, log_file_path, ctx=mx.cpu()):
    
        train_rec_path = os.path.join(data_path, 'train.rec')
        train_idx_path = os.path.join(data_path, 'train.idx')
        val_rec_path = os.path.join(data_path, 'val.rec')
    
    	# 数据预处理部分
        augs = mx.image.CreateDetAugmenter(data_shape=(3, data_size, data_size),
                                           rand_crop=1,
                                           min_object_covered=0.9,
                                           aspect_ratio_range=(0.5, 2),
                                           area_range=(0.1, 1.5),
                                           max_attempts=100,
                                           rand_mirror=True,
                                           rand_gray=0.2,
                                           brightness=0.5,
                                           contrast=0.5,
                                           saturation=0.5,
                                           rand_pad=0.4,
                                           hue=0.5,
                                           mean=True,
                                           std=True,
                                           )
    	# 训练集
        train_iter = mx.image.ImageDetIter(
            path_imgidx=train_idx_path,
            path_imgrec=train_rec_path,
            batch_size=batch_size,
            data_shape=(3, data_size, data_size),
            shuffle=True,
            aug_list=augs,
        )
    	# 验证集
        val_iter = mx.image.ImageDetIter(
            path_imgrec=val_rec_path,
            batch_size=batch_size,
            data_shape=(3, data_size, data_size),
            shuffle=False,
            mean=True,
            std=True
        )
    	# 加载模型
        net = get_model(num_classes, pretrained_base=True, ctx=ctx)
        net.collect_params().reset_ctx(ctx=ctx)
        net.hybridize()
    
        # lrs = mx.lr_scheduler.FactorScheduler(step=200, factor=0.8, stop_factor_lr=lr, base_lr=lr)
    	# 优化器
        trainer = gluon.Trainer(net.collect_params(), 'sgd',
                                {'learning_rate': lr, 'wd': wd, 'momentum': momentum})
    	# 损失函数
        cls_loss = gluon.loss.SoftmaxCrossEntropyLoss()
        # cls_loss = FocalLoss()
        bbox_loss = smooth_l1()
    	# 评价函数
        def evaluate_accuracy(data_iter, net, ctx):
            """
            :param data_iter: 数据集加载器
            :param net: 模型网络
            :param ctx: 可使用的gpu列表
            :return: 验证集准确率
            """
    
            data_iter.reset()
            outs, labels = None, None
            for batch in data_iter:
                X = batch.data[0].as_in_context(ctx)
                Y = batch.label[0].as_in_context(ctx)
    
                anchors,bbox_preds,cls_preds = net(X)
                # 为每个锚框标注类别和偏移量
                cls_probs = nd.SoftmaxActivation(cls_preds.transpose((0, 2, 1)), mode='channel')
                out = nd.contrib.MultiBoxDetection(cls_probs, bbox_preds, anchors,
                                                   force_suppress=True, clip=False, nms_threshold=0.45)
                if outs is None:
                    outs = out
                    labels = Y
                else:
                    outs = nd.concat(outs, out, dim=0)
                    labels = nd.concat(labels, Y, dim=0)
    
                AP = utils.evaluate_MAP(outs, labels)
    
                return AP
    	
    	# 打印训练日志
        # set up logger
        logging.basicConfig(format='%(asctime)s %(message)s')
        logger = logging.getLogger()
        logger.setLevel(logging.INFO)
    
        fh = logging.FileHandler(log_file_path, mode='w')
        # 定义handler的输出格式
        formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s")
        fh.setFormatter(formatter)
    
        logger.addHandler(fh)
    
        ce_metric = mx.metric.Loss('CrossEntropy')
        smoothl1_metric = mx.metric.Loss('SmoothL1')
    
    	# 训练
        for epoch in range(epochs):
    
            ce_metric.reset()
            smoothl1_metric.reset()
    
            train_iter.reset()  # 从头读取数据
            btic = time.time()
    
            for i, batch in enumerate(train_iter):
                X = batch.data[0].as_in_context(ctx)
                Y = batch.label[0].as_in_context(ctx)
                with autograd.record():
                    # 生成多尺度的锚框,为每个锚框预测类别和偏移量
                    anchors, bbox_preds, cls_preds = net(X)
                    # 为每个锚框标注类别和偏移量
                    bbox_labels, bbox_masks, cls_labels = contrib.nd.MultiBoxTarget(
                        anchors, Y, cls_preds.transpose((0, 2, 1)),
                        negative_mining_ratio=3, negative_mining_thresh=.5)
                    # 根据类别和偏移量的预测和标注值计算损失函数
                    cls = cls_loss(cls_preds, cls_labels)
                    bbox = bbox_loss(bbox_preds * bbox_masks, bbox_labels * bbox_masks)
                    l = cls + bbox
                l.backward()
                trainer.step(batch_size)
    
                if i % 50 == 0:
                    ce_metric.update(0, cls)
                    smoothl1_metric.update(0, bbox)
                    name1, loss1 = ce_metric.get()
                    name2, loss2 = smoothl1_metric.get()
                    val_AP = evaluate_accuracy(val_iter, net, ctx)
                    logger.info('[Epoch {}][Batch {}], Speed: {:.3f} samples/sec, {}={:.2e}, {}={:.2e}, val_AP={:.3f}'.format(
                        epoch, i, batch_size / (time.time() - btic), name1, loss1, name2, loss2, val_AP))
                btic = time.time()
    
    	# 保存训练的模型参数
        net.save_parameters(save_model_path)
    
    展开全文
  • mxnet复现SSD系列文章目录 一、数据集的导入. 二、SSD模型架构. 三、训练脚本的实现. 四、损失、评价函数. 五、预测结果. 文章目录mxnet复现SSD系列文章目录前言一、损失函数代码实现二、评价函数代码实现参考链接 ...

    mxnet复现SSD系列文章目录

    一、数据集的导入.
    二、SSD模型架构.
    三、训练脚本的实现.
    四、损失、评价函数.
    五、预测结果.



    前言

    本项目是按照pascal voc的格式读取数据集,数据集为kaggle官网提供的口罩检测数据集,地址:Face Mask Detection,模型架构参考自gluoncv ssd_300_vgg16_atrous_voc源码


    一、损失函数

    类别损失函数采用SoftmaxCrossEntropyLoss
    boundingbox损失函数采用smooth_l1或focal loss

    代码实现

    from mxnet.gluon.loss import Loss
    
    
    class smooth_l1(Loss):
        def __init__(self, weight=None, batch_axis=0, **kwargs):
            super(smooth_l1, self).__init__(weight, batch_axis, **kwargs)
    
        def hybrid_forward(self, F, pred, label):
            loss = F.smooth_l1(pred-label, scalar=1.)
            return F.mean(loss, axis=self._batch_axis, exclude=True)
    
    
    class FocalLoss(Loss):
        def __init__(self,axis=-1,alpha=0.25,gamma=2,batch_axis=0,**kwargs):
            super(FocalLoss,self).__init__(None,batch_axis,**kwargs)
            self.alpha = alpha
            self.gamma = gamma
            self.axis = axis
            self.batch_axis = batch_axis
    
        def hybrid_forward(self, F, y, label):
            y = F.softmax(y)
            pt = F.pick(y, label, axis=self.axis, keepdims=True)
            loss = -self.alpha * ((1 - pt) ** self.gamma) * F.log(pt)
            return F.mean(loss, axis=self._batch_axis, exclude=True)
    

    二、评价函数

    评价函数采用计算每个类别的recall, precision和AP值

    代码实现

    class_recs = {}
        npos = 0
        for imagename in imagenames:
            R = [obj for obj in recs[imagename] if obj['name'] == classname] 
                    bbox = np.array([x['bbox'] for x in R])
            difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
            det = [False] * len(R) #这个值是用来判断是否重复检测的
            npos = npos + sum(~difficult)
            class_recs[imagename] = {'bbox': bbox,
                                     'difficult': difficult,
                                     'det': det}
    
        # read dets
        detfile = detpath.format(classname)
        with open(detfile, 'r') as f:
            lines = f.readlines()
    
        splitlines = [x.strip().split(' ') for x in lines]
        image_ids = [x[0] for x in splitlines]
        confidence = np.array([float(x[1]) for x in splitlines])
        BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
    
        # sort by confidence 
        sorted_ind = np.argsort(-confidence)
        BB = BB[sorted_ind, :]
        image_ids = [image_ids[x] for x in sorted_ind]
    
        # go down dets and mark TPs and FPs
        nd = len(image_ids) 
        tp = np.zeros(nd)
        fp = np.zeros(nd)
        for d in range(nd):
            R = class_recs[image_ids[d]]
            bb = BB[d, :].astype(float)
            ovmax = -np.inf 
            BBGT = R['bbox'].astype(float)
    
            if BBGT.size > 0:
                # compute overlaps
                # intersection
                ixmin = np.maximum(BBGT[:, 0], bb[0])
                iymin = np.maximum(BBGT[:, 1], bb[1])
                ixmax = np.minimum(BBGT[:, 2], bb[2])
                iymax = np.minimum(BBGT[:, 3], bb[3])
                iw = np.maximum(ixmax - ixmin + 1., 0.)
                ih = np.maximum(iymax - iymin + 1., 0.)
                inters = iw * ih
    
                # union
                uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
                       (BBGT[:, 2] - BBGT[:, 0] + 1.) *
                       (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
    
                overlaps = inters / uni
                ovmax = np.max(overlaps)
                jmax = np.argmax(overlaps)
    
            if ovmax > ovthresh:
                if not R['difficult'][jmax]: 
                    if not R['det'][jmax]:
                        tp[d] = 1.
                        R['det'][jmax] = 1 #判断是否重复检测,检测过一次以后,值就从False变为1了
                    else:
                        fp[d] = 1.
            else:
                fp[d] = 1.
    
        # compute precision recall
        fp = np.cumsum(fp) 
        tp = np.cumsum(tp)
        rec = tp / float(npos)
        # avoid divide by zero in case the first detection matches a difficult
        # ground truth
        prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
        ap = voc_ap(rec, prec, use_07_metric)
    
        return rec, prec, ap
    

    计算AP值

    def voc_ap(rec, prec, use_07_metric=False):
        """Compute VOC AP given precision and recall. If use_07_metric is true, uses
        the VOC 07 11-point method (default:False).
        """
        if use_07_metric:
            # 11 point metric
            ap = 0.
            for t in np.arange(0., 1.1, 0.1):
                if np.sum(rec >= t) == 0:
                    p = 0 
                else:
                    p = np.max(prec[rec >= t])
                ap = ap + p / 11.
        else:
            # correct AP calculation
            # first append sentinel values at the end
            mrec = np.concatenate(([0.], rec, [1.]))
            mpre = np.concatenate(([0.], prec, [0.]))
    
            # compute the precision envelope
            for i in range(mpre.size - 1, 0, -1):
                mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
            
            i = np.where(mrec[1:] != mrec[:-1])[0]
    
            # and sum (\Delta recall) * prec
            ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) #计算面积
        return ap
    

    计算mAP值

    def mAP():
    
        detpath,annopath,imagesetfile,cachedir,class_path = get_dir('kitti')
        ovthresh=0.3,
        use_07_metric=False
    
        rec = 0; prec = 0; mAP = 0
        class_list = get_classlist(class_path)
        for classname in class_list:
            rec, prec, ap = voc_eval(detpath,
                                     annopath,
                                     imagesetfile,
                                     classname,
                                     cachedir,
                                     ovthresh=0.5,
                                     use_07_metric=False,
                                     kitti=True)
            print('on {}, the ap is {}, recall is {}, precision is {}'.format(classname, ap, rec[-1], prec[-1]))
            mAP += ap
        
        mAP = float(mAP) / len(class_list)
    
        return mAP
    

    参考链接

    https://zhuanlan.zhihu.com/p/43068926

    展开全文
  • 复现SSD算法基本步骤

    千次阅读 2019-03-04 11:39:17
    这是开源代码下载地址:...前言:复现github上开源的比较知名算法的时候一定要看页面下方或者repository里面的README.md文件(read之所以大写就是为了提醒读者看),里面涵盖了详细的实现步骤...
  • Pytorch-SSD和扩展
  • mxnet复现SSD系列文章目录 一、数据集的导入. 二、SSD模型架构. 三、训练脚本的实现. 四、损失、评价函数. 五、预测结果. 文章目录 mxnet复现SSD系列文章目录 前言 一、pascal VOC 二、导入数据集 1. 拆分数据集 2....
  • 参考文章:SSD关键源码解析-知乎 目标检测|SSD原理与实现 参考代码:balancap/SSD-Tensorflow 真是佩服那些能从头到尾把算法实现好的人。。该有多难啊。 现阶段成果:成功运行单种类目标检测 1.遇到最多的...
  • 如何在Objection detection api上使用SSD_Mobilenetv3——第一部分   论文地址:MixConv: Mixed Depthwise Convolutional Kernels   Object detection api是tensorflow官方提供的目标检测库,其中包含许多经典的...
  • 如何在Objection detection api上使用SSD_Mobilenetv3——第二部分   论文地址:MixConv: Mixed Depthwise Convolutional Kernels   Object detection api是tensorflow官方提供的目标检测库,其中包含许多经典的...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 1,274
精华内容 509
关键字:

复现ssd