精华内容
下载资源
问答
  • ResNet介绍

    万次阅读 多人点赞 2018-07-10 19:37:33
    ResNet介绍1 简要概括ResNet(Residual Neural Network)由微软研究院的Kaiming He等四名华人提出,通过使用ResNet Unit成功训练出了152层的神经网络,并在ILSVRC2015比赛中取得冠军,在top5上的错误率为3.57%,同时...

    ResNet介绍

     

     

    1 简要概括

        ResNet(Residual Neural Network)由微软研究院的Kaiming He等四名华人提出,通过使用ResNet Unit成功训练出了152层的神经网络,并在ILSVRC2015比赛中取得冠军,在top5上的错误率为3.57%,同时参数量比VGGNet低,效果非常突出。ResNet的结构可以极快的加速神经网络的训练,模型的准确率也有比较大的提升。同时ResNet的推广性非常好,甚至可以直接用到InceptionNet网络中。

        ResNet的主要思想是在网络中增加了直连通道,即Highway Network的思想。此前的网络结构是性能输入做一个非线性变换,而Highway Network则允许保留之前网络层的一定比例的输出。ResNet的思想和Highway Network的思想也非常类似,允许原始输入信息直接传到后面的层中,如下图所示。

        这样的话这一层的神经网络可以不用学习整个的输出,而是学习上一个网络输出的残差,因此ResNet又叫做残差网络。

    2 创新点

        提出残差学习的思想。传统的卷积网络或者全连接网络在信息传递的时候或多或少会存在信息丢失,损耗等问题,同时还有导致梯度消失或者梯度爆炸,导致很深的网络无法训练。ResNet在一定程度上解决了这个问题,通过直接将输入信息绕道传到输出,保护信息的完整性,整个网络只需要学习输入、输出差别的那一部分,简化学习目标和难度。VGGNet和ResNet的对比如下图所示。ResNet最大的区别在于有很多的旁路将输入直接连接到后面的层,这种结构也被称为shortcut或者skip connections。

    3 网络结构

        在ResNet网络结构中会用到两种残差模块,一种是以两个3*3的卷积网络串接在一起作为一个残差模块,另外一种是1*1、3*3、1*1的3个卷积网络串接在一起作为一个残差模块。他们如下图所示。

        ResNet有不同的网络层数,比较常用的是50-layer,101-layer,152-layer。他们都是由上述的残差模块堆叠在一起实现的。

    4 代码实现

     

    #%%
    # Copyright 2016 The TensorFlow Authors. All Rights Reserved.
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    # ==============================================================================
    """
    
    Typical use:
    
       from tensorflow.contrib.slim.nets import resnet_v2
    
    ResNet-101 for image classification into 1000 classes:
    
       # inputs has shape [batch, 224, 224, 3]
       with slim.arg_scope(resnet_v2.resnet_arg_scope(is_training)):
          net, end_points = resnet_v2.resnet_v2_101(inputs, 1000)
    
    ResNet-101 for semantic segmentation into 21 classes:
    
       # inputs has shape [batch, 513, 513, 3]
       with slim.arg_scope(resnet_v2.resnet_arg_scope(is_training)):
          net, end_points = resnet_v2.resnet_v2_101(inputs,
                                                    21,
                                                    global_pool=False,
                                                    output_stride=16)
    """
    import collections
    import tensorflow as tf
    slim = tf.contrib.slim
    
    
    #namedtuple是一个函数,它用来创建一个自定义的tuple对象,并且规定了tuple元素的个数,
    #并可以用属性而不是索引来引用tuple的某个元素
    #相当于创建了一个Block类,有scope,unit_fn,args属性
    class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
      """A named tuple describing a ResNet block.
    
      Its parts are:
        scope: The scope of the `Block`.
        unit_fn: The ResNet unit function which takes as input a `Tensor` and
          returns another `Tensor` with the output of the ResNet unit.
        args: A list of length equal to the number of units in the `Block`. The list
          contains one (depth, depth_bottleneck, stride) tuple for each unit in the
          block to serve as argument to unit_fn.
      """
    
    
    def subsample(inputs, factor, scope=None):
      """Subsamples the input along the spatial dimensions.
    
      Args:
        inputs: A `Tensor` of size [batch, height_in, width_in, channels].
        factor: The subsampling factor.
        scope: Optional variable_scope.
    
      Returns:
        output: A `Tensor` of size [batch, height_out, width_out, channels] with the
          input, either intact (if factor == 1) or subsampled (if factor > 1).
      """
      if factor == 1:
        return inputs
      else:
        return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)
    
    
    def conv2d_same(inputs, num_outputs, kernel_size, stride, scope=None):
      """Strided 2-D convolution with 'SAME' padding.
    
      When stride > 1, then we do explicit zero-padding, followed by conv2d with
      'VALID' padding.
    
      Note that
    
         net = conv2d_same(inputs, num_outputs, 3, stride=stride)
    
      is equivalent to
    
         net = slim.conv2d(inputs, num_outputs, 3, stride=1, padding='SAME')
         net = subsample(net, factor=stride)
    
      whereas
    
         net = slim.conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')
    
      is different when the input's height or width is even, which is why we add the
      current function. For more details, see ResnetUtilsTest.testConv2DSameEven().
    
      Args:
        inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
        num_outputs: An integer, the number of output filters.
        kernel_size: An int with the kernel_size of the filters.
        stride: An integer, the output stride.
        rate: An integer, rate for atrous convolution.
        scope: Scope.
    
      Returns:
        output: A 4-D tensor of size [batch, height_out, width_out, channels] with
          the convolution output.
      """
      #conv2d_same是一个卷积后输入和输出图片大小相同的函数
      #步长为1可以直接卷积,不为1,需要计算在图片周围padding的大小再卷积
      if stride == 1:
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=1,
                           padding='SAME', scope=scope)
      else:
        #kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
        pad_total = kernel_size - 1
        pad_beg = pad_total // 2
        pad_end = pad_total - pad_beg
        inputs = tf.pad(inputs,
                        [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
        return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
                           padding='VALID', scope=scope)
    
    
    @slim.add_arg_scope
    def stack_blocks_dense(net, blocks,
                           outputs_collections=None):
      """Stacks ResNet `Blocks` and controls output feature density.
    
      First, this function creates scopes for the ResNet in the form of
      'block_name/unit_1', 'block_name/unit_2', etc.
    
    
      Args:
        net: A `Tensor` of size [batch, height, width, channels].
        blocks: A list of length equal to the number of ResNet `Blocks`. Each
          element is a ResNet `Block` object describing the units in the `Block`.
        outputs_collections: Collection to add the ResNet block outputs.
    
      Returns:
        net: Output tensor 
    
      """
      #生成残差网络所有的堆叠,并存放在net中
      for block in blocks:
        with tf.variable_scope(block.scope, 'block', [net]) as sc:
          for i, unit in enumerate(block.args):
    
            with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
              unit_depth, unit_depth_bottleneck, unit_stride = unit
              net = block.unit_fn(net,
                                  depth=unit_depth,
                                  depth_bottleneck=unit_depth_bottleneck,
                                  stride=unit_stride)
          net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
          
      return net
    
    
    def resnet_arg_scope(is_training=True,
                         weight_decay=0.0001,
                         batch_norm_decay=0.997,
                         batch_norm_epsilon=1e-5,
                         batch_norm_scale=True):
      """Defines the default ResNet arg scope.
    
      TODO(gpapan): The batch-normalization related default values above are
        appropriate for use in conjunction with the reference ResNet models
        released at https://github.com/KaimingHe/deep-residual-networks. When
        training ResNets from scratch, they might need to be tuned.
    
      Args:
        is_training: Whether or not we are training the parameters in the batch
          normalization layers of the model.
        weight_decay: The weight decay to use for regularizing the model.
        batch_norm_decay: The moving average decay when estimating layer activation
          statistics in batch normalization.
        batch_norm_epsilon: Small constant to prevent division by zero when
          normalizing activations by their variance in batch normalization.
        batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
          activations in the batch normalization layer.
    
      Returns:
        An `arg_scope` to use for the resnet models.
      """
      batch_norm_params = {
          'is_training': is_training,
          'decay': batch_norm_decay,
          'epsilon': batch_norm_epsilon,
          'scale': batch_norm_scale,
          'updates_collections': tf.GraphKeys.UPDATE_OPS,
      }
    
      with slim.arg_scope(
          [slim.conv2d],
          weights_regularizer=slim.l2_regularizer(weight_decay),
          weights_initializer=slim.variance_scaling_initializer(),
          activation_fn=tf.nn.relu,
          normalizer_fn=slim.batch_norm,
          normalizer_params=batch_norm_params):
        with slim.arg_scope([slim.batch_norm], **batch_norm_params):
          # The following implies padding='SAME' for pool1, which makes feature
          # alignment easier for dense prediction tasks. This is also used in
          # https://github.com/facebook/fb.resnet.torch. However the accompanying
          # code of 'Deep Residual Learning for Image Recognition' uses
          # padding='VALID' for pool1. You can switch to that choice by setting
          # slim.arg_scope([slim.max_pool2d], padding='VALID').
          with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
            return arg_sc
    
    
    
    
    @slim.add_arg_scope
    def bottleneck(inputs, depth, depth_bottleneck, stride,
                   outputs_collections=None, scope=None):
      """Bottleneck residual unit variant with BN before convolutions.
    
      This is the full preactivation residual unit variant proposed in [2]. See
      Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck
      variant which has an extra bottleneck layer.
    
      When putting together two consecutive ResNet blocks that use this unit, one
      should use stride = 2 in the last unit of the first block.
    
      Args:
        inputs: A tensor of size [batch, height, width, channels].
        depth: The depth of the ResNet unit output.
        depth_bottleneck: The depth of the bottleneck layers.
        stride: The ResNet unit's stride. Determines the amount of downsampling of
          the units output compared to its input.
        rate: An integer, rate for atrous convolution.
        outputs_collections: Collection to add the ResNet unit output.
        scope: Optional variable_scope.
    
      Returns:
        The ResNet unit's output.
      """
      with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:
        depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
        preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')
        if depth == depth_in:
          shortcut = subsample(inputs, stride, 'shortcut')
        else:
          shortcut = slim.conv2d(preact, depth, [1, 1], stride=stride,
                                 normalizer_fn=None, activation_fn=None,
                                 scope='shortcut')
    
        residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1,
                               scope='conv1')
        residual = conv2d_same(residual, depth_bottleneck, 3, stride,
                                            scope='conv2')
        residual = slim.conv2d(residual, depth, [1, 1], stride=1,
                               normalizer_fn=None, activation_fn=None,
                               scope='conv3')
    
        output = shortcut + residual
    
        return slim.utils.collect_named_outputs(outputs_collections,
                                                sc.name,
                                                output)
    
    
    def resnet_v2(inputs,
                  blocks,
                  num_classes=None,
                  global_pool=True,
                  include_root_block=True,
                  reuse=None,
                  scope=None):
      """Generator for v2 (preactivation) ResNet models.
    
      This function generates a family of ResNet v2 models. See the resnet_v2_*()
      methods for specific model instantiations, obtained by selecting different
      block instantiations that produce ResNets of various depths.
    
    
      Args:
        inputs: A tensor of size [batch, height_in, width_in, channels].
        blocks: A list of length equal to the number of ResNet blocks. Each element
          is a resnet_utils.Block object describing the units in the block.
        num_classes: Number of predicted classes for classification tasks. If None
          we return the features before the logit layer.
        include_root_block: If True, include the initial convolution followed by
          max-pooling, if False excludes it. If excluded, `inputs` should be the
          results of an activation-less convolution.
        reuse: whether or not the network and its variables should be reused. To be
          able to reuse 'scope' must be given.
        scope: Optional variable_scope.
    
    
      Returns:
        net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
          If global_pool is False, then height_out and width_out are reduced by a
          factor of output_stride compared to the respective height_in and width_in,
          else both height_out and width_out equal one. If num_classes is None, then
          net is the output of the last ResNet block, potentially after global
          average pooling. If num_classes is not None, net contains the pre-softmax
          activations.
        end_points: A dictionary from components of the network to the corresponding
          activation.
    
      Raises:
        ValueError: If the target output_stride is not valid.
      """
      with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse=reuse) as sc:
        end_points_collection = sc.original_name_scope + '_end_points'
        with slim.arg_scope([slim.conv2d, bottleneck,
                             stack_blocks_dense],
                            outputs_collections=end_points_collection):
          net = inputs
          if include_root_block:
            # We do not include batch normalization or activation functions in conv1
            # because the first ResNet unit will perform these. Cf. Appendix of [2].
            with slim.arg_scope([slim.conv2d],
                                activation_fn=None, normalizer_fn=None):
              net = conv2d_same(net, 64, 7, stride=2, scope='conv1')
            net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
          net = stack_blocks_dense(net, blocks)
          # This is needed because the pre-activation variant does not have batch
          # normalization or activation functions in the residual unit output. See
          # Appendix of [2].
          net = slim.batch_norm(net, activation_fn=tf.nn.relu, scope='postnorm')
          if global_pool:
            # Global average pooling.
            net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
          if num_classes is not None:
            net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
                              normalizer_fn=None, scope='logits')
          # Convert end_points_collection into a dictionary of end_points.
          end_points = slim.utils.convert_collection_to_dict(end_points_collection)
          if num_classes is not None:
            end_points['predictions'] = slim.softmax(net, scope='predictions')
          return net, end_points
    
    
    
    def resnet_v2_50(inputs,
                     num_classes=None,
                     global_pool=True,
                     reuse=None,
                     scope='resnet_v2_50'):
      """ResNet-50 model of [1]. See resnet_v2() for arg and return description."""
      blocks = [
          Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
          Block(
              'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
          Block(
              'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
          Block(
              'block4', bottleneck, [(2048, 512, 1)] * 3)]
      return resnet_v2(inputs, blocks, num_classes, global_pool,
                       include_root_block=True, reuse=reuse, scope=scope)
    
    
    def resnet_v2_101(inputs,
                      num_classes=None,
                      global_pool=True,
                      reuse=None,
                      scope='resnet_v2_101'):
      """ResNet-101 model of [1]. See resnet_v2() for arg and return description."""
      blocks = [
          Block(
              'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
          Block(
              'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
          Block(
              'block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
          Block(
              'block4', bottleneck, [(2048, 512, 1)] * 3)]
      return resnet_v2(inputs, blocks, num_classes, global_pool,
                       include_root_block=True, reuse=reuse, scope=scope)
    
    
    def resnet_v2_152(inputs,
                      num_classes=None,
                      global_pool=True,
                      reuse=None,
                      scope='resnet_v2_152'):
      """ResNet-152 model of [1]. See resnet_v2() for arg and return description."""
      blocks = [
          Block(
              'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
          Block(
              'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
          Block(
              'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
          Block(
              'block4', bottleneck, [(2048, 512, 1)] * 3)]
      return resnet_v2(inputs, blocks, num_classes, global_pool,
                       include_root_block=True, reuse=reuse, scope=scope)
    
    
    def resnet_v2_200(inputs,
                      num_classes=None,
                      global_pool=True,
                      reuse=None,
                      scope='resnet_v2_200'):
      """ResNet-200 model of [2]. See resnet_v2() for arg and return description."""
      blocks = [
          Block(
              'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
          Block(
              'block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
          Block(
              'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
          Block(
              'block4', bottleneck, [(2048, 512, 1)] * 3)]
      return resnet_v2(inputs, blocks, num_classes, global_pool,
                       include_root_block=True, reuse=reuse, scope=scope)
    
      
    from datetime import datetime
    import math
    import time
    def time_tensorflow_run(session, target, info_string):
        num_steps_burn_in = 10
        total_duration = 0.0
        total_duration_squared = 0.0
        for i in range(num_batches + num_steps_burn_in):
            start_time = time.time()
            _ = session.run(target)
            duration = time.time() - start_time
            if i >= num_steps_burn_in:
                if not i % 10:
                    print ('%s: step %d, duration = %.3f' %
                           (datetime.now(), i - num_steps_burn_in, duration))
                total_duration += duration
                total_duration_squared += duration * duration
        mn = total_duration / num_batches
        vr = total_duration_squared / num_batches - mn * mn
        sd = math.sqrt(vr)
        print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
               (datetime.now(), info_string, num_batches, mn, sd))
        
    batch_size = 32
    height, width = 224, 224
    inputs = tf.random_uniform((batch_size, height, width, 3))
    with slim.arg_scope(resnet_arg_scope(is_training=False)):
       net, end_points = resnet_v2_152(inputs, 1000)
      
    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)  
    num_batches=100
    time_tensorflow_run(sess, net, "Forward") 
    
    

     

    5 参考文献

     

    [1]黄文坚,唐源.TensorFlow实战[M].北京:电子工业出版社,2017.

    [2]https://arxiv.org/abs/1512.03385

    [3]https://github.com/tensorflow/models/blob/master/research/slim/nets/resnet_v2.py

     

    插播:

    阿里云-城市大脑团体需要NLP算法工程师,校招/社招/实习均可,可私信或联系wuwen.lw@alibaba-inc.com

     

     

     

    展开全文
  • ResNet

    2021-01-09 02:43:23
    <div><p>ResNet的第一层的卷积核应该为7x7的吧,你这实现怎么为3x3的了?</p><p>该提问来源于开源项目:kuangliu/pytorch-cifar</p></div>
  • resnet

    2020-09-21 10:59:34
    resnet 文章目录resnet思路 思路 神经网络在反向传播过程中要不断地传播梯度,而当网络层数加深时,梯度在传播过程中会逐渐消失(假如采用Sigmoid函数,对于幅度为1的信号,每向后传递一层,梯度就衰减为原来的0.25...

    resnet

    文章目录

    思路

    image-20200905153024536

    神经网络在反向传播过程中要不断地传播梯度,而当网络层数加深时,梯度在传播过程中会逐渐消失(假如采用Sigmoid函数,对于幅度为1的信号,每向后传递一层,梯度就衰减为原来的0.25,层数越多,衰减越厉害),导致无法对前面网络层的权重进行有效的调整。

    随着网络的加深,累计的参数变多,越容易导致梯度下降或者梯度爆炸,于是现象就是更深的网络有着更强大的表达,但是随着网络的增加,最终的效果却不好,于是resnet的思路就是在进行网络加深的时候进行一个类似短路的操作,保证最终的效果

    image-20200905155943036

    通过这样的结构,中间网络的参数减小,导致更深层的网络的实现成为可能。

    例子:

    class ResBlk(nn.Module):
        """
        resnet block
        """
    
        def __init__(self, ch_in, ch_out):# ch_in, ch_out不一定一致,假设ch_in为64,ch_out为256
            """
            :param ch_in:
            :param ch_out:
            """
            super(ResBlk, self).__init__()
    
            self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=3, stride=1, padding=1)
            self.bn1 = nn.BatchNorm2d(ch_out)
            self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=3, stride=1, padding=1)
            self.bn2 = nn.BatchNorm2d(ch_out)
    
            self.extra = nn.Sequential()
            if ch_out != ch_in:#如果输入的channel与输出的channel不相同,将输入channel变为输出channel
                # [b, ch_in, h, w] => [b, ch_out, h, w]
                self.extra = nn.Sequential(
                    nn.Conv2d(ch_in, ch_out, kernel_size=1, stride=1),
                    nn.BatchNorm2d(ch_out)
                )
    
    
        def forward(self, x):
            """
            :param x: [b, ch, h, w]
            :return:
            """
            out = F.relu(self.bn1(self.conv1(x)))
            out = self.bn2(self.conv2(out))
            # short cut.
            # extra module: [b, ch_in, h, w] => [b, ch_out, h, w]
            # element-wise add:
            out = self.extra(x) + out
    
            return out
    
    
    

    image-20200921103445055

    两种结构分别针对ResNet34(左图)和ResNet50/101/152(右图),其目的主要就是为了降低参数的数目。左图是两个3x3x256的卷积,参数数目: 3x3x256x256x2 = 1179648,右图是第一个1x1的卷积把256维通道降到64维,然后在最后通过1x1卷积恢复,整体上用的参数数目:1x1x256x64 + 3x3x64x64 + 1x1x64x256 = 69632,右图的参数数量比左图减少了16.94倍,因此,右图的主要目的就是为了减少参数量,从而减少计算量。
    对于常规的ResNet,可以用于34层或者更少的网络中(左图);对于更深的网络(如101层),则使用右图,其目的是减少计算和参数量。

    image-20200921103736298

    其中weight指conv层,BN指Batch Normalization层,ReLU指激活层,addition指相加;

    参考:https://blog.csdn.net/chenyuping333/article/details/82344334

    展开全文
  • ResNet解析

    万次阅读 多人点赞 2018-01-14 18:04:00
    ResNet在2015年被提出,在ImageNet比赛classification任务上获得第一名,因为它“简单与实用”并存,之后很多方法都建立在ResNet50或者ResNet101的基础上完成的,检测,分割,识别等领域都纷纷使用ResNet,Alpha ...

    ResNet在2015年被提出,在ImageNet比赛classification任务上获得第一名,因为它“简单与实用”并存,之后很多方法都建立在ResNet50或者ResNet101的基础上完成的,检测,分割,识别等领域都纷纷使用ResNet,Alpha zero也使用了ResNet,所以可见ResNet确实很好用。
    下面我们从实用的角度去看看ResNet。

    1.ResNet意义

    随着网络的加深,出现了训练集准确率下降的现象,我们可以确定这不是由于Overfit过拟合造成的(过拟合的情况训练集应该准确率很高);所以作者针对这个问题提出了一种全新的网络,叫深度残差网络,它允许网络尽可能的加深,其中引入了全新的结构如图1;
    这里问大家一个问题
    残差指的是什么
    其中ResNet提出了两种mapping:一种是identity mapping,指的就是图1中”弯弯的曲线”,另一种residual mapping,指的就是除了”弯弯的曲线“那部分,所以最后的输出是 y=F(x)+xy=F(x)+x
    identity mapping顾名思义,就是指本身,也就是公式中的xx,而residual mapping指的是“”,也就是yxy−x,所以残差指的就是F(x)F(x)部分。
    为什么ResNet可以解决“随着网络加深,准确率不下降”的问题?
    除了实验证明外:
    这里写图片描述
    表1,Resnet在ImageNet上的结果
    理论上,对于“随着网络加深,准确率下降”的问题,Resnet提供了两种选择方式,也就是identity mapping和residual mapping,如果网络已经到达最优,继续加深网络,residual mapping将被push为0,只剩下identity mapping,这样理论上网络一直处于最优状态了,网络的性能也就不会随着深度增加而降低了。

    2.ResNet结构

    它使用了一种连接方式叫做“shortcut connection”,顾名思义,shortcut就是“抄近道”的意思,看下图我们就能大致理解:
    这里写图片描述
    图1 Shortcut Connection
    这是文章里面的图,我们可以看到一个“弯弯的弧线“这个就是所谓的”shortcut connection“,也是文中提到identity mapping,这张图也诠释了ResNet的真谛,当然大家可以放心,真正在使用的ResNet模块并不是这么单一,文章中就提出了两种方式:
    这里写图片描述
    图2 两种ResNet设计
    这两种结构分别针对ResNet34(左图)和ResNet50/101/152(右图),一般称整个结构为一个”building block“。其中右图又称为”bottleneck design”,目的一目了然,就是为了降低参数的数目,第一个1x1的卷积把256维channel降到64维,然后在最后通过1x1卷积恢复,整体上用的参数数目:1x1x256x64 + 3x3x64x64 + 1x1x64x256 = 69632,而不使用bottleneck的话就是两个3x3x256的卷积,参数数目: 3x3x256x256x2 = 1179648,差了16.94倍。
    对于常规ResNet,可以用于34层或者更少的网络中,对于Bottleneck Design的ResNet通常用于更深的如101这样的网络中,目的是减少计算和参数量(实用目的)。

    问大家一个问题:
    如图1所示,如果F(x)和x的channel个数不同怎么办,因为F(x)和x是按照channel维度相加的,channel不同怎么相加呢?
    针对channel个数是否相同,要分成两种情况考虑,如下图:
    这里写图片描述
    图3 两种Shortcut Connection方式
    如图3所示,我们可以清楚的”实线“和”虚线“两种连接方式,
    实线的的Connection部分(”第一个粉色矩形和第三个粉色矩形“)都是执行3x3x64的卷积,他们的channel个数一致,所以采用计算方式:
    y=F(x)+xy=F(x)+x
    虚线的的Connection部分(”第一个绿色矩形和第三个绿色矩形“)分别是3x3x64和3x3x128的卷积操作,他们的channel个数不同(64和128),所以采用计算方式:
    y=F(x)+Wxy=F(x)+Wx
    其中W是卷积操作,用来调整x的channel维度的;
    下面我们看看两个实例:
    这里写图片描述
    图4 两种Shortcut Connection方式实例(左图channel一致,右图channel不一样)

    3.ResNet50和ResNet101

    这里把ResNet50和ResNet101特别提出,主要因为它们的出镜率很高,所以需要做特别的说明。给出了它们具体的结构:
    这里写图片描述
    表2,Resnet不同的结构
    首先我们看一下表2,上面一共提出了5中深度的ResNet,分别是18,34,50,101和152,首先看表2最左侧,我们发现所有的网络都分成5部分,分别是:conv1,conv2_x,conv3_x,conv4_x,conv5_x,之后的其他论文也会专门用这个称呼指代ResNet50或者101的每部分。
    拿101-layer那列,我们先看看101-layer是不是真的是101层网络,首先有个输入7x7x64的卷积,然后经过3 + 4 + 23 + 3 = 33个building block,每个block为3层,所以有33 x 3 = 99层,最后有个fc层(用于分类),所以1 + 99 + 1 = 101层,确实有101层网络;
    注:101层网络仅仅指卷积或者全连接层,而激活层或者Pooling层并没有计算在内;
    这里我们关注50-layer和101-layer这两列,可以发现,它们唯一的不同在于conv4_x,ResNet50有6个block,而ResNet101有23个block,查了17个block,也就是17 x 3 = 51层。

    4.基于ResNet101的Faster RCNN

    文章中把ResNet101应用在Faster RCNN上取得了更好的结果,结果如下:
    这里写图片描述
    这里写图片描述
    表3,Resnet101 Faster RCNN在Pascal VOC07/12 以及COCO上的结果
    这里有个问题:
    Faster RCNN中RPN和Fast RCNN的共享特征图用的是conv5_x的输出么?
    针对这个问题我们看看实际的基于ResNet101的Faster RCNN的结构图:
    这里写图片描述
    图5 基于ResNet101的Faster RCNN
    图5展示了整个Faster RCNN的架构,其中蓝色的部分为ResNet101,可以发现conv4_x的最后的输出为RPN和RoI Pooling共享的部分,而conv5_x(共9层网络)都作用于RoI Pooling之后的一堆特征图(14 x 14 x 1024),特征图的大小维度也刚好符合原本的ResNet101中conv5_x的输入;
    最后大家一定要记得最后要接一个average pooling,得到2048维特征,分别用于分类和框回归。

    展开全文
  • Resnet

    万次阅读 多人点赞 2016-10-26 11:59:34
    Resnet采用了另一种方法,也就是不分解原问题,而是把神经网络分解来降低拟合函数的复杂度,而损失函数是拟合函数的函数,所以降低拟合函数的复杂度也等同于降低了损失函数的复杂度。假设我们需要拟合的复杂函数为H...

    再上一偏博文中我们说到越复杂的问题需要越深层的神经网络拟合,但是越深层的神经网络越难训练,原因可能是过拟合以及损失函数的局部最优解过多(鞍点过多?导致经过相同的epoch更深的网络的trainerror大于较浅的网络,因为过鞍点需要更多的epoch,鞍点附近梯度很小,相关文献The Loss Surfaces of Multilayer Networks)导致模型更容易收敛到局部最优解。上一篇博文讲到可以把复杂问题分解成几个较简单的问题,然后分别训练几个较简单的模型,最后讲几个模型的输出连接起来送入FC层分类。
    Resnet采用了另一种方法,也就是不分解原问题,而是把神经网络分解来降低拟合函数的复杂度,而损失函数是拟合函数的函数,所以降低拟合函数的复杂度也等同于降低了损失函数的复杂度。假设我们需要拟合的复杂函数为H(x),现在我们把H(x)分解为两个更简单的函数f(x)和g(x),即令H(x) = f(x) + g(x)。(Resnet不考虑过拟合)此处f(x)函数的结构和H(x)其实是一致的,都是几层卷积池化加上最后的fc层,但是因为有g(x)的加入使得f(x)的复杂度小于H(x).具体如下图所示:

    这里写图片描述

    由于我们并不知道什么样的g(x)最好,所以这里g(x)也需要训练,而一个三层的神经网络可以拟合连续闭区间上的任意函数,我们用一个三层的神经网络代替g(x),但是这样又会因为增加了额外的参数造成过拟合的风险,我们看到图中有两个三层的神经网络,我们可以把它们合并起来,所以一个更好的方法如下:
    这里写图片描述

    这样一来由于fc层的输入增加为了维持表达能力,fc层的隐层单元势必要增加,这样还是增加了参数,所以Resnet又做了调整:
    这里写图片描述

    展开全文
  • 六、ResNet网络详细解析(超详细哦)

    万次阅读 多人点赞 2019-07-31 14:50:01
    ResNet在2015年被提出,在ImageNet比赛classification任务上获得第一名,因为它“简单与实用”并存,之后很多方法都建立在ResNet50或者ResNet101的基础上完成的,检测,分割,识别等领域里得到广泛的应用。...
  • PyTorch实现的ResNet50、ResNet101和ResNet152

    万次阅读 多人点赞 2019-01-12 21:10:37
    PyTorch实现的ResNet50、ResNet101和ResNet152 import torch import torch.nn as nn import torchvision print(&amp;quot;PyTorch Version: &amp;quot;,torch.__version__) print(&amp;quot;...
  • ResNet代码

    2019-03-27 16:41:22
    ResNet 的pytorch实现方法 包括resnet50 resnet101 resnet161
  • 使用TensorFlow-2.0的ResNetResNet18,ResNet34,ResNet50,ResNet101,ResNet152 )实现 有关更多的CNN,请参见 。 火车 要求: Python> = 3.6 Tensorflow == 2.0.0 要在自己的数据集上训练ResNet,可以将...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 16,682
精华内容 6,672
关键字:

resnet