精华内容
下载资源
问答
  • 基于Tensorflow的Resnet程序实现(CIFAR10准确率为91.5%)

    万次阅读 多人点赞 2018-09-07 14:57:28
    在上一篇博文中我重写了Tensorflow中的CNN的实现,对于CIFAR10的测试集的准确率为85%左右。在这个实现中,用到了2个卷积层和2个全连接层。具体的模型架构如下:  为了进一步提高准确率,我们可以采用一些更先进...

           在上一篇博文中我重写了Tensorflow中的CNN的实现,对于CIFAR10的测试集的准确率为85%左右。在这个实现中,用到了2个卷积层和2个全连接层。具体的模型架构如下:

           为了进一步提高准确率,我们可以采用一些更先进的模型架构,其中一种很出名的架构就是RESNET,残差网络。这是Kaiming大神在2015年的论文"Deep Residual Learning for Image Recognition"中提到的一种网络架构,其思想是观察到一般的神经网络结构随着层数的加深,训练的误差反而会增大,因此引入了残差这个概念,把上一层的输出直接和下一层的输出相加,如下图所示。这样理论上随着网络层数的加深,引入这个结构并不会使得误差比浅层的网络更大,因为随着参数的优化,如果浅层网络已经逼近了最优值,那么之后的网络层相当于一个恒等式,即每一层的输入和输出相等,因此更深的层数不会额外增加训练误差。

           在2016年,Kaiming大神发布了另一篇论文“Identity Mappings in Deep Residual Networks”,在这个论文中对Resnet的网络结构作了进一步的改进。改进前和改进后的resnet网络结构如下图所示,按照论文的解释,改进后的结构可以在前向和后向更好的传递残差,因此能取得更好的优化效果:

           在Tensorflow的官方模型中,已经带了一个Resnet的实现,用这个模型训练,在110层的深度下,可以达到CIFAR10测试集92%左右的准确率。但是,这个代码实在是写的比较难读,做了很多辅助功能的封装,每次看代码都是要跳来跳去的看,实在是很不方便。为此我也再次改写了这个代码,按照Kaiming论文介绍的方式来进行模型的构建,在110层的网络层数下,可以达到91%左右的准确率,和官方模型的很接近。

           具体的代码分为两部分,我把构建Resnet模型的代码单独封装在一个文件中。如以下的代码,这个代码里面的_resnet_block_v1和_resnet_block_v2分别对应了上图的两种不同的resnet结构:

    import tensorflow as tf
    
    def _resnet_block_v1(inputs, filters, stride, projection, stage, blockname, TRAINING):
        # defining name basis
        conv_name_base = 'res' + str(stage) + blockname + '_branch'
        bn_name_base = 'bn' + str(stage) + blockname + '_branch'
    
        with tf.name_scope("conv_block_stage" + str(stage)):
            if projection:
                shortcut = tf.layers.conv2d(inputs, filters, (1,1), 
                                            strides=(stride, stride), 
                                            name=conv_name_base + '1', 
                                            kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                            reuse=tf.AUTO_REUSE, padding='same', 
                                            data_format='channels_first')
                shortcut = tf.layers.batch_normalization(shortcut, axis=1, name=bn_name_base + '1', 
                                                         training=TRAINING, reuse=tf.AUTO_REUSE)
            else:
                shortcut = inputs
    
            outputs = tf.layers.conv2d(inputs, filters,
                                      kernel_size=(3, 3),
                                      strides=(stride, stride), 
                                      kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                      name=conv_name_base+'2a', reuse=tf.AUTO_REUSE, padding='same', 
                                      data_format='channels_first')
            outputs = tf.layers.batch_normalization(outputs, axis=1, name=bn_name_base+'2a', 
                                                    training=TRAINING, reuse=tf.AUTO_REUSE)
            outputs = tf.nn.relu(outputs)
    	
            outputs = tf.layers.conv2d(outputs, filters,
                                      kernel_size=(3, 3),
                                      strides=(1, 1), 
                                      kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                      name=conv_name_base+'2b', reuse=tf.AUTO_REUSE, padding='same', 
                                      data_format='channels_first')
            outputs = tf.layers.batch_normalization(outputs, axis=1, name=bn_name_base+'2b', 
                                                    training=TRAINING, reuse=tf.AUTO_REUSE)
            outputs = tf.add(shortcut, outputs)
            outputs = tf.nn.relu(outputs)								  
        return outputs
    	
    def _resnet_block_v2(inputs, filters, stride, projection, stage, blockname, TRAINING):
        # defining name basis
        conv_name_base = 'res' + str(stage) + blockname + '_branch'
        bn_name_base = 'bn' + str(stage) + blockname + '_branch'
    
        with tf.name_scope("conv_block_stage" + str(stage)):
            shortcut = inputs
            outputs = tf.layers.batch_normalization(inputs, axis=1, name=bn_name_base+'2a', 
                                                    training=TRAINING, reuse=tf.AUTO_REUSE)
            outputs = tf.nn.relu(outputs)		
            if projection:
                shortcut = tf.layers.conv2d(outputs, filters, (1,1), 
                                            strides=(stride, stride), 
                                            name=conv_name_base + '1', 
                                            kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                            reuse=tf.AUTO_REUSE, padding='same', 
                                            data_format='channels_first')
                shortcut = tf.layers.batch_normalization(shortcut, axis=1, name=bn_name_base + '1', 
                                                         training=TRAINING, reuse=tf.AUTO_REUSE)
    								
            outputs = tf.layers.conv2d(outputs, filters,
                                      kernel_size=(3, 3),
                                      strides=(stride, stride), 
                                      kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                      name=conv_name_base+'2a', reuse=tf.AUTO_REUSE, padding='same', 
                                      data_format='channels_first')
            
            outputs = tf.layers.batch_normalization(outputs, axis=1, name=bn_name_base+'2b', 
                                                    training=TRAINING, reuse=tf.AUTO_REUSE)
            outputs = tf.nn.relu(outputs)
            outputs = tf.layers.conv2d(outputs, filters,
                                      kernel_size=(3, 3),
                                      strides=(1, 1),
                                      kernel_initializer=tf.contrib.layers.variance_scaling_initializer(),
                                      name=conv_name_base+'2b', reuse=tf.AUTO_REUSE, padding='same', 
                                      data_format='channels_first')
    
            outputs = tf.add(shortcut, outputs)
        return outputs
    
    def inference(images, training, filters, n, ver):
        """Construct the resnet model
    
        Args:
          images: [batch*channel*height*width]
    	  training: boolean
    	  filters: integer, the filters of the first resnet stage, the next stage will have filters*2
    	  n: integer, how many resnet blocks in each stage, the total layers number is 6n+2
    	  ver: integer, can be 1 or 2, for resnet v1 or v2
        Returns:
          Tensor, model inference output
        """
        #Layer1 is a 3*3 conv layer, input channels are 3, output channels are 16
        inputs = tf.layers.conv2d(images, filters=16, kernel_size=(3, 3), strides=(1, 1), 
                                  name='conv1', reuse=tf.AUTO_REUSE, padding='same', data_format='channels_first')
    
        #no need to batch normal and activate for version 2 resnet.
        if ver==1:
            inputs = tf.layers.batch_normalization(inputs, axis=1, name='bn_conv1',
                                                   training=training, reuse=tf.AUTO_REUSE)
            inputs = tf.nn.relu(inputs)
    
        for stage in range(3):
            stage_filter = filters*(2**stage)
            for i in range(n):
                stride = 1
                projection = False
                if i==0 and stage>0:
                    stride = 2
                    projection = True
                if ver==1:
                    inputs = _resnet_block_v1(inputs, stage_filter, stride, projection, 
    				                          stage, blockname=str(i), TRAINING=training)
                else:
                    inputs = _resnet_block_v2(inputs, stage_filter, stride, projection, 
    				                          stage, blockname=str(i), TRAINING=training)
    
        #only need for version 2 resnet.
        if ver==2:
            inputs = tf.layers.batch_normalization(inputs, axis=1, name='pre_activation_final_norm', 
                                                   training=training, reuse=tf.AUTO_REUSE)
            inputs = tf.nn.relu(inputs)
    
        axes = [2, 3]
        inputs = tf.reduce_mean(inputs, axes, keep_dims=True)
        inputs = tf.identity(inputs, 'final_reduce_mean')
    
        inputs = tf.reshape(inputs, [-1, filters*(2**2)])
        inputs = tf.layers.dense(inputs=inputs, units=10, name='dense1', reuse=tf.AUTO_REUSE)
        return inputs

           另外一部分的代码就是和Cifar10的处理相关的,其中Cifar10的50000张图片中选取45000张作为训练集,另外5000张作为验证集,测试的10000张图片都作为测试集。在98层的网络深度下,测试集的准确度可以达到92%左右.

    import tensorflow as tf
    import numpy as np
    import os
    import resnet_model
    
    #Construct the filenames that include the train cifar10 images
    folderPath = 'cifar-10-batches-bin/'
    filenames = [os.path.join(folderPath, 'data_batch_%d.bin' % i) for i in xrange(1,6)]
    
    #Define the parameters of the cifar10 image
    imageWidth = 32
    imageHeight = 32
    imageDepth = 3
    label_bytes = 1
    
    #Define the train and test batch size
    batch_size = 100
    test_batch_size = 100
    valid_batch_size = 100
    
    #Calulate the per image bytes and record bytes
    image_bytes = imageWidth * imageHeight * imageDepth
    record_bytes = label_bytes + image_bytes
    
    #Construct the dataset to read the train images
    dataset = tf.data.FixedLengthRecordDataset(filenames, record_bytes)
    dataset = dataset.shuffle(50000)
    
    #Get the first 45000 records as train dataset records
    train_dataset = dataset.take(45000)
    train_dataset = train_dataset.batch(batch_size)
    train_dataset = train_dataset.repeat(300)
    iterator = train_dataset.make_initializable_iterator()
    
    #Get the remain 5000 records as valid dataset records
    valid_dataset = dataset.skip(45000)
    valid_dataset = valid_dataset.batch(valid_batch_size)
    validiterator = valid_dataset.make_initializable_iterator()
    
    #Construct the dataset to read the test images
    testfilename = os.path.join(folderPath, 'test_batch.bin')
    testdataset = tf.data.FixedLengthRecordDataset(testfilename, record_bytes)
    testdataset = testdataset.batch(test_batch_size)
    testiterator = testdataset.make_initializable_iterator()
    
    #Decode the train records from the iterator
    record = iterator.get_next()
    record_decoded_bytes = tf.decode_raw(record, tf.uint8)
    
    #Get the labels from the records
    record_labels = tf.slice(record_decoded_bytes, [0, 0], [batch_size, 1])
    record_labels = tf.cast(record_labels, tf.int32)
    
    #Get the images from the records
    record_images = tf.slice(record_decoded_bytes, [0, 1], [batch_size, image_bytes])
    record_images = tf.reshape(record_images, [batch_size, imageDepth, imageHeight, imageWidth])
    record_images = tf.transpose(record_images, [0, 2, 3, 1])
    record_images = tf.cast(record_images, tf.float32)
    
    #Decode the records from the valid iterator
    validrecord = validiterator.get_next()
    validrecord_decoded_bytes = tf.decode_raw(validrecord, tf.uint8)
    
    #Get the labels from the records
    validrecord_labels = tf.slice(validrecord_decoded_bytes, [0, 0], [valid_batch_size, 1])
    validrecord_labels = tf.cast(validrecord_labels, tf.int32)
    validrecord_labels = tf.reshape(validrecord_labels, [-1])
    
    #Get the images from the records
    validrecord_images = tf.slice(validrecord_decoded_bytes, [0, 1], [valid_batch_size, image_bytes])
    validrecord_images = tf.cast(validrecord_images, tf.float32)
    validrecord_images = tf.reshape(validrecord_images, 
                                   [valid_batch_size, imageDepth, imageHeight, imageWidth])
    validrecord_images = tf.transpose(validrecord_images, [0, 2, 3, 1])
    
    #Decode the test records from the iterator
    testrecord = testiterator.get_next()
    testrecord_decoded_bytes = tf.decode_raw(testrecord, tf.uint8)
    
    #Get the labels from the records
    testrecord_labels = tf.slice(testrecord_decoded_bytes, [0, 0], [test_batch_size, 1])
    testrecord_labels = tf.cast(testrecord_labels, tf.int32)
    testrecord_labels = tf.reshape(testrecord_labels, [-1])
    
    #Get the images from the records
    testrecord_images = tf.slice(testrecord_decoded_bytes, [0, 1], [test_batch_size, image_bytes])
    testrecord_images = tf.cast(testrecord_images, tf.float32)
    testrecord_images = tf.reshape(testrecord_images, 
                                   [test_batch_size, imageDepth, imageHeight, imageWidth])
    testrecord_images = tf.transpose(testrecord_images, [0, 2, 3, 1])
    
    #Random crop the images after pad each side with 4 pixels
    distorted_images = tf.image.resize_image_with_crop_or_pad(record_images, 
                                                              imageHeight+8, imageWidth+8)
    distorted_images = tf.random_crop(distorted_images, size = [batch_size, imageHeight, imageHeight, 3])
    
    #Unstack the images as the follow up operation are on single train image
    distorted_images = tf.unstack(distorted_images)
    for i in xrange(len(distorted_images)):
        distorted_images[i] = tf.image.random_flip_left_right(distorted_images[i])
        distorted_images[i] = tf.image.random_brightness(distorted_images[i], max_delta=63)
        distorted_images[i] = tf.image.random_contrast(distorted_images[i], lower=0.2, upper=1.8)
        distorted_images[i] = tf.image.per_image_standardization(distorted_images[i])
        
    #Stack the images
    distorted_images = tf.stack(distorted_images)
    
    #transpose to set the channel first
    distorted_images = tf.transpose(distorted_images, perm=[0, 3, 1, 2])
    
    #Unstack the images as the follow up operation are on single image
    validrecord_images = tf.unstack(validrecord_images)
    for i in xrange(len(validrecord_images)):
        validrecord_images[i] = tf.image.per_image_standardization(validrecord_images[i])
        
    #Stack the images
    validrecord_images = tf.stack(validrecord_images)
    
    #transpose to set the channel first
    validrecord_images = tf.transpose(validrecord_images, perm=[0, 3, 1, 2])
    
    #Unstack the images as the follow up operation are on single image
    testrecord_images = tf.unstack(testrecord_images)
    for i in xrange(len(testrecord_images)):
        testrecord_images[i] = tf.image.per_image_standardization(testrecord_images[i])
        
    #Stack the images
    testrecord_images = tf.stack(testrecord_images)
    
    #transpose to set the channel first
    testrecord_images = tf.transpose(testrecord_images, perm=[0, 3, 1, 2])
    
    global_step = tf.Variable(0, trainable=False)
    boundaries = [10000, 15000, 20000, 25000]
    values = [0.1, 0.05, 0.01, 0.005, 0.001]
    learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
    weight_decay = 2e-4
    filters = 16  #the first resnet block filter number
    n = 5  #the basic resnet block number, total network layers are 6n+2
    ver = 2   #the resnet block version
    
    #Get the inference logits by the model
    result = resnet_model.inference(distorted_images, True, filters, n, ver)
    
    #Calculate the cross entropy loss
    cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=record_labels, logits=result)
    cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
    
    #Add the l2 weights to the loss
    #Add weight decay to the loss.
    l2_loss = weight_decay * tf.add_n(
        # loss is computed using fp32 for numerical stability.
        [tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])
    tf.summary.scalar('l2_loss', l2_loss)
    loss = cross_entropy_mean + l2_loss
    
    #Define the optimizer
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)
    
    #Relate to the batch normalization
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        opt_op = optimizer.minimize(loss, global_step)
    
    valid_accuracy = tf.placeholder(tf.float32)
    test_accuracy = tf.placeholder(tf.float32)
    tf.summary.scalar("valid_accuracy", valid_accuracy)
    tf.summary.scalar("test_accuracy", test_accuracy)
    tf.summary.scalar("learning_rate", learning_rate)
    
    validresult = tf.argmax(resnet_model.inference(validrecord_images, False, filters, n, ver), axis=1)
    testresult = tf.argmax(resnet_model.inference(testrecord_images, False, filters, n, ver), axis=1)
    
    #Create the session and run the graph
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())
    sess.run(iterator.initializer)
    
    #Merge all the summary and write
    summary_op = tf.summary.merge_all()
    train_filewriter = tf.summary.FileWriter('train/', sess.graph)
    
    step = 0
    while(True):
        try:
            lossValue, lr, _ = sess.run([loss, learning_rate, opt_op])
            if step % 100 == 0:
                print "step %i: Learning_rate: %f Loss: %f" %(step, lr, lossValue)
            if step % 1000 == 0:
                saver.save(sess, 'model/my-model', global_step=step)
                truepredictNum = 0
                sess.run([testiterator.initializer, validiterator.initializer])
                accuracy1 = 0.0
                accuracy2 = 0.0
                while(True):
                    try:
                        predictValue, testValue = sess.run([validresult, validrecord_labels])
                        truepredictNum += np.sum(predictValue==testValue)
                    except tf.errors.OutOfRangeError:
                        print "valid correct num: %i" %(truepredictNum)
                        accuracy1 = truepredictNum / 5000.0
                        break
                truepredictNum = 0
                while(True):
                    try:
                        predictValue, testValue = sess.run([testresult, testrecord_labels])
                        truepredictNum += np.sum(predictValue==testValue)
                    except tf.errors.OutOfRangeError:
                        print "test correct num: %i" %(truepredictNum)
                        accuracy2 = truepredictNum / 10000.0
                        break
                summary = sess.run(summary_op, feed_dict={valid_accuracy: accuracy1, test_accuracy: accuracy2})
                train_filewriter.add_summary(summary, step)
            step += 1
        except tf.errors.OutOfRangeError:
            break

     

    展开全文
  • 在上一篇博文中我重写了Tensorflow中的CNN的实现,对于CIFAR10的测试集的准确率为85%左右。在这个实现中,用到了2个卷积层和2个全连接层。具体的模型架构如下: 为了进一步提高准确率,我们可以采用一些更先进的...

    https://blog.csdn.net/gzroy/article/details/82386540

     在上一篇博文中我重写了Tensorflow中的CNN的实现,对于CIFAR10的测试集的准确率为85%左右。在这个实现中,用到了2个卷积层和2个全连接层。具体的模型架构如下:

           为了进一步提高准确率,我们可以采用一些更先进的模型架构,其中一种很出名的架构就是RESNET,残差网络。这是Kaiming大神在2015年的论文"Deep Residual Learning for Image Recognition"中提到的一种网络架构,其思想是观察到一般的神经网络结构随着层数的加深,训练的误差反而会增大,因此引入了残差这个概念,把上一层的输出直接和下一层的输出相加,如下图所示。这样理论上随着网络层数的加深,引入这个结构并不会使得误差比浅层的网络更大,因为随着参数的优化,如果浅层网络已经逼近了最优值,那么之后的网络层相当于一个恒等式,即每一层的输入和输出相等,因此更深的层数不会额外增加训练误差。

           在2016年,Kaiming大神发布了另一篇论文“Identity Mappings in Deep Residual Networks”,在这个论文中对Resnet的网络结构作了进一步的改进。改进前和改进后的resnet网络结构如下图所示,按照论文的解释,改进后的结构可以在前向和后向更好的传递残差,因此能取得更好的优化效果:

           在Tensorflow的官方模型中,已经带了一个Resnet的实现,用这个模型训练,在110层的深度下,可以达到CIFAR10测试集92%左右的准确率。但是,这个代码实在是写的比较难读,做了很多辅助功能的封装,每次看代码都是要跳来跳去的看,实在是很不方便。为此我也再次改写了这个代码,按照Kaiming论文介绍的方式来进行模型的构建,在110层的网络层数下,可以达到91%左右的准确率,和官方模型的很接近。

           具体的代码分为两部分,我把构建Resnet模型的代码单独封装在一个文件中。如以下的代码,这个代码里面的_resnet_block_v1和_resnet_block_v2分别对应了上图的两种不同的resnet结构:
     

    import tensorflow as tf
     
    def _resnet_block_v1(inputs, filters, stride, projection, stage, blockname, TRAINING):
        # defining name basis
        conv_name_base = 'res' + str(stage) + blockname + '_branch'
        bn_name_base = 'bn' + str(stage) + blockname + '_branch'
     
        with tf.name_scope("conv_block_stage" + str(stage)):
            if projection:
                shortcut = tf.layers.conv2d(inputs, filters, (1,1), 
                                            strides=(stride, stride), 
                                            name=conv_name_base + '1', 
                                            kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                            reuse=tf.AUTO_REUSE, padding='same', 
                                            data_format='channels_first')
                shortcut = tf.layers.batch_normalization(shortcut, axis=1, name=bn_name_base + '1', 
                                                         training=TRAINING, reuse=tf.AUTO_REUSE)
            else:
                shortcut = inputs
     
            outputs = tf.layers.conv2d(inputs, filters,
                                      kernel_size=(3, 3),
                                      strides=(stride, stride), 
                                      kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                      name=conv_name_base+'2a', reuse=tf.AUTO_REUSE, padding='same', 
                                      data_format='channels_first')
            outputs = tf.layers.batch_normalization(outputs, axis=1, name=bn_name_base+'2a', 
                                                    training=TRAINING, reuse=tf.AUTO_REUSE)
            outputs = tf.nn.relu(outputs)
    	
            outputs = tf.layers.conv2d(outputs, filters,
                                      kernel_size=(3, 3),
                                      strides=(1, 1), 
                                      kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                      name=conv_name_base+'2b', reuse=tf.AUTO_REUSE, padding='same', 
                                      data_format='channels_first')
            outputs = tf.layers.batch_normalization(outputs, axis=1, name=bn_name_base+'2b', 
                                                    training=TRAINING, reuse=tf.AUTO_REUSE)
            outputs = tf.add(shortcut, outputs)
            outputs = tf.nn.relu(outputs)								  
        return outputs
    	
    def _resnet_block_v2(inputs, filters, stride, projection, stage, blockname, TRAINING):
        # defining name basis
        conv_name_base = 'res' + str(stage) + blockname + '_branch'
        bn_name_base = 'bn' + str(stage) + blockname + '_branch'
     
        with tf.name_scope("conv_block_stage" + str(stage)):
            shortcut = inputs
            outputs = tf.layers.batch_normalization(inputs, axis=1, name=bn_name_base+'2a', 
                                                    training=TRAINING, reuse=tf.AUTO_REUSE)
            outputs = tf.nn.relu(outputs)		
            if projection:
                shortcut = tf.layers.conv2d(outputs, filters, (1,1), 
                                            strides=(stride, stride), 
                                            name=conv_name_base + '1', 
                                            kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                            reuse=tf.AUTO_REUSE, padding='same', 
                                            data_format='channels_first')
                shortcut = tf.layers.batch_normalization(shortcut, axis=1, name=bn_name_base + '1', 
                                                         training=TRAINING, reuse=tf.AUTO_REUSE)
    								
            outputs = tf.layers.conv2d(outputs, filters,
                                      kernel_size=(3, 3),
                                      strides=(stride, stride), 
                                      kernel_initializer=tf.contrib.layers.variance_scaling_initializer(), 
                                      name=conv_name_base+'2a', reuse=tf.AUTO_REUSE, padding='same', 
                                      data_format='channels_first')
            
            outputs = tf.layers.batch_normalization(outputs, axis=1, name=bn_name_base+'2b', 
                                                    training=TRAINING, reuse=tf.AUTO_REUSE)
            outputs = tf.nn.relu(outputs)
            outputs = tf.layers.conv2d(outputs, filters,
                                      kernel_size=(3, 3),
                                      strides=(1, 1),
                                      kernel_initializer=tf.contrib.layers.variance_scaling_initializer(),
                                      name=conv_name_base+'2b', reuse=tf.AUTO_REUSE, padding='same', 
                                      data_format='channels_first')
     
            outputs = tf.add(shortcut, outputs)
        return outputs
     
    def inference(images, training, filters, n, ver):
        """Construct the resnet model
        Args:
          images: [batch*channel*height*width]
    	  training: boolean
    	  filters: integer, the filters of the first resnet stage, the next stage will have filters*2
    	  n: integer, how many resnet blocks in each stage, the total layers number is 6n+2
    	  ver: integer, can be 1 or 2, for resnet v1 or v2
        Returns:
          Tensor, model inference output
        """
        #Layer1 is a 3*3 conv layer, input channels are 3, output channels are 16
        inputs = tf.layers.conv2d(images, filters=16, kernel_size=(3, 3), strides=(1, 1), 
                                  name='conv1', reuse=tf.AUTO_REUSE, padding='same', data_format='channels_first')
     
        #no need to batch normal and activate for version 2 resnet.
        if ver==1:
            inputs = tf.layers.batch_normalization(inputs, axis=1, name='bn_conv1',
                                                   training=training, reuse=tf.AUTO_REUSE)
            inputs = tf.nn.relu(inputs)
     
        for stage in range(3):
            stage_filter = filters*(2**stage)
            for i in range(n):
                stride = 1
                projection = False
                if i==0 and stage>0:
                    stride = 2
                    projection = True
                if ver==1:
                    inputs = _resnet_block_v1(inputs, stage_filter, stride, projection, 
    				                          stage, blockname=str(i), TRAINING=training)
                else:
                    inputs = _resnet_block_v2(inputs, stage_filter, stride, projection, 
    				                          stage, blockname=str(i), TRAINING=training)
     
        #only need for version 2 resnet.
        if ver==2:
            inputs = tf.layers.batch_normalization(inputs, axis=1, name='pre_activation_final_norm', 
                                                   training=training, reuse=tf.AUTO_REUSE)
            inputs = tf.nn.relu(inputs)
     
        axes = [2, 3]
        inputs = tf.reduce_mean(inputs, axes, keep_dims=True)
        inputs = tf.identity(inputs, 'final_reduce_mean')
     
        inputs = tf.reshape(inputs, [-1, filters*(2**2)])
        inputs = tf.layers.dense(inputs=inputs, units=10, name='dense1', reuse=tf.AUTO_REUSE)
        return inputs

    另外一部分的代码就是和Cifar10的处理相关的,其中Cifar10的50000张图片中选取45000张作为训练集,另外5000张作为验证集,测试的10000张图片都作为测试集。在98层的网络深度下,测试集的准确度可以达到92%左右.

    import tensorflow as tf
    import numpy as np
    import os
    import resnet_model
     
    #Construct the filenames that include the train cifar10 images
    folderPath = 'cifar-10-batches-bin/'
    filenames = [os.path.join(folderPath, 'data_batch_%d.bin' % i) for i in xrange(1,6)]
     
    #Define the parameters of the cifar10 image
    imageWidth = 32
    imageHeight = 32
    imageDepth = 3
    label_bytes = 1
     
    #Define the train and test batch size
    batch_size = 100
    test_batch_size = 100
    valid_batch_size = 100
     
    #Calulate the per image bytes and record bytes
    image_bytes = imageWidth * imageHeight * imageDepth
    record_bytes = label_bytes + image_bytes
     
    #Construct the dataset to read the train images
    dataset = tf.data.FixedLengthRecordDataset(filenames, record_bytes)
    dataset = dataset.shuffle(50000)
     
    #Get the first 45000 records as train dataset records
    train_dataset = dataset.take(45000)
    train_dataset = train_dataset.batch(batch_size)
    train_dataset = train_dataset.repeat(300)
    iterator = train_dataset.make_initializable_iterator()
     
    #Get the remain 5000 records as valid dataset records
    valid_dataset = dataset.skip(45000)
    valid_dataset = valid_dataset.batch(valid_batch_size)
    validiterator = valid_dataset.make_initializable_iterator()
     
    #Construct the dataset to read the test images
    testfilename = os.path.join(folderPath, 'test_batch.bin')
    testdataset = tf.data.FixedLengthRecordDataset(testfilename, record_bytes)
    testdataset = testdataset.batch(test_batch_size)
    testiterator = testdataset.make_initializable_iterator()
     
    #Decode the train records from the iterator
    record = iterator.get_next()
    record_decoded_bytes = tf.decode_raw(record, tf.uint8)
     
    #Get the labels from the records
    record_labels = tf.slice(record_decoded_bytes, [0, 0], [batch_size, 1])
    record_labels = tf.cast(record_labels, tf.int32)
     
    #Get the images from the records
    record_images = tf.slice(record_decoded_bytes, [0, 1], [batch_size, image_bytes])
    record_images = tf.reshape(record_images, [batch_size, imageDepth, imageHeight, imageWidth])
    record_images = tf.transpose(record_images, [0, 2, 3, 1])
    record_images = tf.cast(record_images, tf.float32)
     
    #Decode the records from the valid iterator
    validrecord = validiterator.get_next()
    validrecord_decoded_bytes = tf.decode_raw(validrecord, tf.uint8)
     
    #Get the labels from the records
    validrecord_labels = tf.slice(validrecord_decoded_bytes, [0, 0], [valid_batch_size, 1])
    validrecord_labels = tf.cast(validrecord_labels, tf.int32)
    validrecord_labels = tf.reshape(validrecord_labels, [-1])
     
    #Get the images from the records
    validrecord_images = tf.slice(validrecord_decoded_bytes, [0, 1], [valid_batch_size, image_bytes])
    validrecord_images = tf.cast(validrecord_images, tf.float32)
    validrecord_images = tf.reshape(validrecord_images, 
                                   [valid_batch_size, imageDepth, imageHeight, imageWidth])
    validrecord_images = tf.transpose(validrecord_images, [0, 2, 3, 1])
     
    #Decode the test records from the iterator
    testrecord = testiterator.get_next()
    testrecord_decoded_bytes = tf.decode_raw(testrecord, tf.uint8)
     
    #Get the labels from the records
    testrecord_labels = tf.slice(testrecord_decoded_bytes, [0, 0], [test_batch_size, 1])
    testrecord_labels = tf.cast(testrecord_labels, tf.int32)
    testrecord_labels = tf.reshape(testrecord_labels, [-1])
     
    #Get the images from the records
    testrecord_images = tf.slice(testrecord_decoded_bytes, [0, 1], [test_batch_size, image_bytes])
    testrecord_images = tf.cast(testrecord_images, tf.float32)
    testrecord_images = tf.reshape(testrecord_images, 
                                   [test_batch_size, imageDepth, imageHeight, imageWidth])
    testrecord_images = tf.transpose(testrecord_images, [0, 2, 3, 1])
     
    #Random crop the images after pad each side with 4 pixels
    distorted_images = tf.image.resize_image_with_crop_or_pad(record_images, 
                                                              imageHeight+8, imageWidth+8)
    distorted_images = tf.random_crop(distorted_images, size = [batch_size, imageHeight, imageHeight, 3])
     
    #Unstack the images as the follow up operation are on single train image
    distorted_images = tf.unstack(distorted_images)
    for i in xrange(len(distorted_images)):
        distorted_images[i] = tf.image.random_flip_left_right(distorted_images[i])
        distorted_images[i] = tf.image.random_brightness(distorted_images[i], max_delta=63)
        distorted_images[i] = tf.image.random_contrast(distorted_images[i], lower=0.2, upper=1.8)
        distorted_images[i] = tf.image.per_image_standardization(distorted_images[i])
        
    #Stack the images
    distorted_images = tf.stack(distorted_images)
     
    #transpose to set the channel first
    distorted_images = tf.transpose(distorted_images, perm=[0, 3, 1, 2])
     
    #Unstack the images as the follow up operation are on single image
    validrecord_images = tf.unstack(validrecord_images)
    for i in xrange(len(validrecord_images)):
        validrecord_images[i] = tf.image.per_image_standardization(validrecord_images[i])
        
    #Stack the images
    validrecord_images = tf.stack(validrecord_images)
     
    #transpose to set the channel first
    validrecord_images = tf.transpose(validrecord_images, perm=[0, 3, 1, 2])
     
    #Unstack the images as the follow up operation are on single image
    testrecord_images = tf.unstack(testrecord_images)
    for i in xrange(len(testrecord_images)):
        testrecord_images[i] = tf.image.per_image_standardization(testrecord_images[i])
        
    #Stack the images
    testrecord_images = tf.stack(testrecord_images)
     
    #transpose to set the channel first
    testrecord_images = tf.transpose(testrecord_images, perm=[0, 3, 1, 2])
     
    global_step = tf.Variable(0, trainable=False)
    boundaries = [10000, 15000, 20000, 25000]
    values = [0.1, 0.05, 0.01, 0.005, 0.001]
    learning_rate = tf.train.piecewise_constant(global_step, boundaries, values)
    weight_decay = 2e-4
    filters = 16  #the first resnet block filter number
    n = 5  #the basic resnet block number, total network layers are 6n+2
    ver = 2   #the resnet block version
     
    #Get the inference logits by the model
    result = resnet_model.inference(distorted_images, True, filters, n, ver)
     
    #Calculate the cross entropy loss
    cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=record_labels, logits=result)
    cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
     
    #Add the l2 weights to the loss
    #Add weight decay to the loss.
    l2_loss = weight_decay * tf.add_n(
        # loss is computed using fp32 for numerical stability.
        [tf.nn.l2_loss(tf.cast(v, tf.float32)) for v in tf.trainable_variables()])
    tf.summary.scalar('l2_loss', l2_loss)
    loss = cross_entropy_mean + l2_loss
     
    #Define the optimizer
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)
     
    #Relate to the batch normalization
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        opt_op = optimizer.minimize(loss, global_step)
     
    valid_accuracy = tf.placeholder(tf.float32)
    test_accuracy = tf.placeholder(tf.float32)
    tf.summary.scalar("valid_accuracy", valid_accuracy)
    tf.summary.scalar("test_accuracy", test_accuracy)
    tf.summary.scalar("learning_rate", learning_rate)
     
    validresult = tf.argmax(resnet_model.inference(validrecord_images, False, filters, n, ver), axis=1)
    testresult = tf.argmax(resnet_model.inference(testrecord_images, False, filters, n, ver), axis=1)
     
    #Create the session and run the graph
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())
    sess.run(iterator.initializer)
     
    #Merge all the summary and write
    summary_op = tf.summary.merge_all()
    train_filewriter = tf.summary.FileWriter('train/', sess.graph)
     
    step = 0
    while(True):
        try:
            lossValue, lr, _ = sess.run([loss, learning_rate, opt_op])
            if step % 100 == 0:
                print "step %i: Learning_rate: %f Loss: %f" %(step, lr, lossValue)
            if step % 1000 == 0:
                saver.save(sess, 'model/my-model', global_step=step)
                truepredictNum = 0
                sess.run([testiterator.initializer, validiterator.initializer])
                accuracy1 = 0.0
                accuracy2 = 0.0
                while(True):
                    try:
                        predictValue, testValue = sess.run([validresult, validrecord_labels])
                        truepredictNum += np.sum(predictValue==testValue)
                    except tf.errors.OutOfRangeError:
                        print "valid correct num: %i" %(truepredictNum)
                        accuracy1 = truepredictNum / 5000.0
                        break
                truepredictNum = 0
                while(True):
                    try:
                        predictValue, testValue = sess.run([testresult, testrecord_labels])
                        truepredictNum += np.sum(predictValue==testValue)
                    except tf.errors.OutOfRangeError:
                        print "test correct num: %i" %(truepredictNum)
                        accuracy2 = truepredictNum / 10000.0
                        break
                summary = sess.run(summary_op, feed_dict={valid_accuracy: accuracy1, test_accuracy: accuracy2})
                train_filewriter.add_summary(summary, step)
            step += 1
        except tf.errors.OutOfRangeError:
            break

     

    展开全文
  • 当然可能我写得烂……参数没精调…准确率提升也不是很明显 首先是数据处理部分,我自己实现过几个框架 多线程数据读入和处理框架(tf.data) 较早版本的队列多线程读入处理框架 最普通的单线程(用法和自带MNIST...

    //之前的代码有问题,重发一遍
    这里做个整理,打算给一个ResNet的完整代码实现
    这篇博客不对ResNet DenseNet思路做详解,只是提供一个大致的代码实现和编写思路
    以及一些自己的小实验
    当然也包括ResNeXt和SENet
    其中SE-ResNeXt是最后一届ImageNet比赛的冠军模型
    当然可能我写得烂……参数没精调…准确率提升也不是很明显
    首先是数据处理部分,我自己实现过几个框架
    多线程数据读入和处理框架(tf.data)
    较早版本的队列多线程读入处理框架
    最普通的单线程(用法和自带MNIST一样)读取数据
    我用了多线程tf.data
    Cifar10原生数据集是python打包的
    至于图片的话,参考我的这篇,把Cifar10图片提取出来
    主要是为了制成TFRecord格式,这样数据读取速度会很快(毕竟自家的东西优化得给足)
    VGG AlexNet之前已经跑过了就不跑了,滑动平均下准确率已经在85%以上了
    我自己的AlexNetVGG16
    (均嵌入SENet,并且移除全连接层,当然甚至可以把池化也扔掉,用步长2x2的卷积代替池化,这样的话网络看起来像是全卷积网络FCN)
    我用全局平均池化+1x1卷积代替全连接层,同时还加上了BN,SENet等后来才有的结构
    准确率能跑到85%以上
    比较深的网络(VGG19)不加上BN很难收敛,我跑Kaggle的时候跑了10个epoch还是50%的准确率
    戳我了解SENet
    戳我了解ResNeXt
    在这里插入图片描述
    SENet感觉像是一种Attention机制
    ResNet和ResNeXt的差别不是很大,就是卷积分成了几个通路,像Inception那样
    不同的是每个通路都是同构的
    在这里插入图片描述
    在这里插入图片描述
    有三种等价的实现,我这里采用了c的结构
    学习率使用warmup策略,因而学习率可以设置到一个比较大的数字(ResNet50设置到了0.036都基本可以正常训练,不过有一定概率中期崩掉)
    网络中的一些超参数:
    Optimizer: MomentumOptimizer(SGDM)
    conv_initializer:xavier(tf.layers.conv2d默认kernel_initializer,实测比stddev=0.01的截断正态分布好)
    momentum:0.9
    warmup_step:batch_num*10
    learning_rate_base:0.012
    learning_rate_decay:0.96
    learning_rate_step:batch_num
    train_epochs:140
    batch_size:128
    batch_num:50000/batch_size
    SE_rate:32
    cardinality:8
    regularizer_rate:0.008
    drop_rate:0.3
    exponential_moving_average_decay:0.999
    虽然cifar10只有32x32
    但是实际输入网络的时候被resize为64x64 或者128x128 (测试准确率差异在1%以内,可以忽略不记)
    完整代码:
    ResNet50/ResNeXt/SE-AlexNet
    数据处理部分:

    import tensorflow as tf
    import numpy as np
    import matplotlib.pyplot as plt
    from time import time
    from PIL import Image
    from sklearn.utils import shuffle
    from tqdm import tqdm_notebook as tqdm
    import os
    from tqdm import tqdm_notebook as tqdm
    data_path="C:\\Users\\Shijunfeng\\tensorflow_gpu\\dataseats\\cifar10\\Image"
    mode_path="C:\\Users\\Shijunfeng\\tensorflow_gpu\\dataseats\\cifar10\\Image\\TFMode"
    tf.reset_default_graph()
    def imshows(classes,images,labels,index,amount,predictions=None):
        #classes 类别数组
        #image 图片数组
        #labels 标签数组
        #index amount 从数组第index开始输出amount张照片
        #prediction 预测结果
        fig=plt.gcf()
        fig.set_size_inches(10,20)#大小看怎么调整合适把
        for i in range(amount):
            title="lab:"+classes[np.argmax(labels[index+i])]
            if predictions is not None:
                title=title+"prd:"+name[np.argmax(predictions[index+i])]
            ax=plt.subplot(5,6,i+1)#每行五个,输出6行
            ax.set_title(title)
            ax.imshow(images[index+i])
        plt.show()
    class DatasetReader(object):
        def __init__(self,data_path,image_size=None):
            self.data_path=data_path
            self.img_size=image_size
            self.img_size.append(3)
            self.train_path=os.path.join(data_path,"train")
            self.test_path=os.path.join(data_path,"test")
            self.TF_path=os.path.join(data_path,"TFRecordData")
            self.tf_train_path=os.path.join(self.TF_path,"train")
            self.tf_test_path=os.path.join(self.TF_path,"test")
            self.classes=os.listdir(self.train_path)
            self.__Makedirs()
            self.train_batch_initializer=None
            self.test_batch_initializer=None
            self.__CreateTFRecord(self.train_path,self.tf_train_path)
            self.__CreateTFRecord(self.test_path,self.tf_test_path)
        def __CreateTFRecord(self,read_path,save_path):
            path=os.path.join(save_path,"data.TFRecord")
            if os.path.exists(path):
                print("find file "+(os.path.join(save_path,"data.TFRecords")))
                return
            else:
                print("cannot find file %s,ready to recreate"%(os.path.join(save_path,"data.TFRecords")))
            writer=tf.python_io.TFRecordWriter(path=path)
            image_path=[]
            image_label=[]
            image_size=[int(self.img_size[0]*1.3),int(self.img_size[1]*1.3)]
            for label,class_name in enumerate(self.classes):
                class_path=os.path.join(read_path,class_name)
                for image_name in os.listdir(class_path):
                    image_path.append(os.path.join(class_path,image_name))
                    image_label.append(label)
            for i in range(5):image_path,image_label=shuffle(image_path,image_label)
            for i in tqdm(range(len(image_path)),desc="TFRecord"):
                image,label=Image.open(image_path[i]).resize(image_size,Image.ANTIALIAS),image_label[i]
                image=image.convert("RGB")
                image=image.tobytes()
                example=tf.train.Example(features=tf.train.Features(feature={
                            "label":tf.train.Feature(int64_list=tf.train.Int64List(value=[label])),
                            "image":tf.train.Feature(bytes_list=tf.train.BytesList(value=[image]))
                        }))
                writer.write(example.SerializeToString())
            writer.close()
        def __Makedirs(self):
            if not os.path.exists(self.TF_path):
                os.makedirs(self.TF_path)
            if not os.path.exists(self.tf_train_path):
                os.makedirs(self.tf_train_path)
            if not os.path.exists(self.tf_test_path):
                os.makedirs(self.tf_test_path)
        def __parsed(self,tensor):
            raw_image_size=[int(self.img_size[0]*1.3),int(self.img_size[1]*1.3),3]
            feature=tf.parse_single_example(tensor,features={
                        "image":tf.FixedLenFeature([],tf.string),
                        "label":tf.FixedLenFeature([],tf.int64)
                })
            image=tf.decode_raw(feature["image"],tf.uint8)
            image=tf.reshape(image,raw_image_size)
            image=tf.random_crop(image,self.img_size)
            image=tf.image.per_image_standardization(image)
            label=tf.cast(feature["label"],tf.int32)
            label=tf.one_hot(label,len(self.classes))
            return image,label
        def __parsed_distorted(self,tensor):
            raw_image_size=[int(self.img_size[0]*1.3),int(self.img_size[1]*1.3),3]
            feature=tf.parse_single_example(tensor,features={
                        "image":tf.FixedLenFeature([],tf.string),
                        "label":tf.FixedLenFeature([],tf.int64)
                })
            image=tf.decode_raw(feature["image"],tf.uint8)
            image=tf.reshape(image,raw_image_size)
            image=tf.random_crop(image,self.img_size)
            image=tf.image.random_flip_left_right(image)
            image=tf.image.random_flip_up_down(image)
            image=tf.image.random_brightness(image,max_delta=0.4)
            image=tf.image.random_hue(image,max_delta=0.4)
            image=tf.image.random_contrast(image,lower=0.7,upper=1.4)
            image=tf.image.random_saturation(image,lower=0.7,upper=1.4)
            image=tf.image.per_image_standardization(image)
            label=tf.cast(feature["label"],tf.int32)
            label=tf.one_hot(label,len(self.classes))
            return image,label
        def __GetBatchIterator(self,path,parsed,batch_size):
            filename=[os.path.join(path,name)for name in os.listdir(path)]
            dataset=tf.data.TFRecordDataset(filename)
            dataset=dataset.prefetch(tf.contrib.data.AUTOTUNE)
            dataset=dataset.apply(tf.data.experimental.shuffle_and_repeat(buffer_size=10000,count=None,seed=99997))
            dataset=dataset.apply(tf.data.experimental.map_and_batch(parsed,batch_size))
            dataset=dataset.apply(tf.data.experimental.prefetch_to_device("/gpu:0"))
            iterator=dataset.make_initializable_iterator()
            return iterator.initializer,iterator.get_next()
        def __detail(self,path):
            Max=-1e9
            Min=1e9
            print("train dataset:")
            path=[os.path.join(path,name)for name in self.classes]
            for i in range(len(self.classes)):
                num=len(os.listdir(path[i]))
                print("%-12s:%3d"%(self.classes[i],num))
                Max=max(Max,num)
                Min=min(Min,num)
            print("max:%d min:%d"%(Max,Min))
        def detail(self):
            print("class num:",len(self.classes))
            self.__detail(self.train_path)
            self.__detail(self.test_path)
        def global_variables_initializer(self):
            initializer=[]
            initializer.append(self.train_batch_initializer)
            initializer.append(self.test_batch_initializer)
            initializer.append(tf.global_variables_initializer())
            return initializer
        def test_batch(self,batch_size):
            self.test_batch_initializer,batch=self.__GetBatchIterator(self.tf_test_path,self.__parsed,batch_size)
            return batch
        def train_batch(self,batch_size):
            self.train_batch_initializer,batch=self.__GetBatchIterator(self.tf_train_path,self.__parsed_distorted,batch_size)
            return batch
    

    网络结构定义:

    #tf.reset_default_graph()
    def warmup_exponential_decay(warmup_step,rate_base,global_step,decay_step,decay_rate,staircase=False):
        linear_increase=rate_base*tf.cast(global_step/warmup_step,tf.float32)
        exponential_decay=tf.train.exponential_decay(rate_base,global_step-warmup_step,decay_step,decay_rate,staircase)
        return tf.cond(global_step<=warmup_step,lambda:linear_increase,lambda:exponential_decay)
    
    class ConvNet(object):
        def __init__(self,training=False,regularizer_rate=0.0,drop_rate=0.0,SE_rate=32,cardinality=8,average_class=None):
            self.training=training
            self.SE_rate=SE_rate
            self.cardinality=cardinality
            self.drop_rate=drop_rate
            if regularizer_rate!=0.0:self.regularizer=tf.contrib.layers.l2_regularizer(regularizer_rate)
            else:self.regularizer=None
            if average_class is not None:self.ema=average_class.average
            else:self.ema=None
        def Xavier(self):
            tf.contrib.layers.xavier_initializer_conv2d()
        def BatchNorm(self,x,name="BatchNorm"):
            return tf.layers.batch_normalization(x,training=self.training,reuse=tf.AUTO_REUSE)
        def Dropout(self,x):
            return tf.layers.dropout(x,rate=self.drop_rate,training=self.training)
        def Conv2d(self,x,filters,ksize,strides=[1,1],padding="SAME",activation=tf.nn.relu,use_bn=True,dilation=[1,1],name="conv"):
            #Conv BN Relu
            with tf.variable_scope(name,reuse=tf.AUTO_REUSE):
                input_channel=x.shape.as_list()[-1]
                kernel=tf.get_variable("kernel",ksize+[input_channel,filters],initializer=self.Xavier())
                bias=tf.get_variable("bias",[filters],initializer=tf.constant_initializer(0.0))
                if self.ema is not None and self.training==False: #滑动平均
                    kernel,bias=self.ema(kernel),self.ema(bias)
                conv=tf.nn.conv2d(x,kernel,strides=[1]+strides+[1],padding=padding,dilations=[1]+dilation+[1],name=name)
                bias_add=tf.nn.bias_add(conv,bias)
                if use_bn:bias_add=self.BatchNorm(bias_add) #BN
                if activation is not None:bias_add=activation(bias_add) #Relu
                if self.regularizer is not None:tf.add_to_collection("losses",self.regularizer(kernel))
            return bias_add
        def CardConv2d(self,x,filters,ksize,strides=[1,1],padding="SAME",activation=tf.nn.relu,use_bn=True,dilation=[1,1],name="cardconv"):
        '''ResNeXt的多路卷积层'''
            with tf.variable_scope(name,tf.AUTO_REUSE):
                split=tf.split(x,num_or_size_splits=self.cardinality,axis=-1)
                filters=int(filters/self.cardinality)
                for i in range(len(split)):
                    split[i]=self.Conv2d(x,filters,ksize,strides,padding,activation,use_bn,dilation,name)
                merge=tf.concat(split,axis=-1)
                return merge
        def MaxPool(self,x,ksize,strides=[1,1],padding="SAME",name="max_pool"):
            input_channel=x.shape.as_list()[-1]
            return tf.nn.max_pool(x,ksize=[1]+ksize+[1],strides=[1]+strides+[1],padding=padding,name=name)
        def GlobalAvgPool(self,x,name="GAP"):
            ksize=[1]+x.shape.as_list()[1:-1]+[1]
            return tf.nn.avg_pool(x,ksize=ksize,strides=[1,1,1,1],padding="VALID",name=name)
        def FC_conv(self,x,dim,activation=tf.nn.relu,use_bn=True,name="conv_1x1"):
        '''1x1卷积代替全连接层'''
            return self.Conv2d(x,dim,[1,1],activation=activation,use_bn=use_bn,name=name)
        def SENet(self,x,name="SENet"):
            with tf.variable_scope(name,tf.AUTO_REUSE):
                channel=x.shape.as_list()[-1]
                squeeze=self.GlobalAvgPool(x,name="squeeze")
                excitation=self.FC_conv(x,int(channel/self.SE_rate),name="excitation_1")
                excitation=self.FC_conv(excitation,channel,activation=tf.nn.sigmoid,name="excitation_2")
            return excitation*x
        def Residual(self,x,channels,name="Residual"):
            with tf.variable_scope(name,tf.AUTO_REUSE):
                res=self.Conv2d(x,channels[0],[1,1],name="conv_1")
                res=self.Conv2d(res,channels[1],[3,3],name="conv_2")
                res=self.Conv2d(res,channels[2],[1,1],activation=None,use_bn=True,name="conv_3")
                res=self.SENet(res)
                if x.shape.as_list()[-1]!=channels[2]:
                    x=self.Conv2d(x,channels[2],[1,1],activation=None,use_bn=False,name="conv_linear")
                return tf.nn.relu(res+x)
        def ResidualX(self,x,channels,name="Residual"):
            with tf.variable_scope(name,tf.AUTO_REUSE):
                res=self.Conv2d(x,channels[0],[1,1],name="conv_1")
                res=self.CardConv2d(res,channels[1],[3,3],name="conv_2")
                res=self.Conv2d(res,channels[2],[1,1],activation=None,use_bn=True,name="conv_3")
                res=self.SENet(res)
                if x.shape.as_list()[-1]!=channels[2]:
                    x=self.Conv2d(x,channels[2],[1,1],activation=None,use_bn=False,name="conv_linear")
                return tf.nn.relu(res+x)
        def ResNet(self,x):
            x=self.Conv2d(x,64,[7,7],[2,2],name="conv1")
            x=self.MaxPool(x,[3,3],[2,2],name="pool1")
            
            x=self.Residual(x,[64,64,128],name="Residual1_1")
            x=self.Residual(x,[64,64,128],name="Residual1_2")
            x=self.Residual(x,[64,64,128],name="Residual1_3")
            x=self.MaxPool(x,[3,3],[2,2],name="pool2")
            
            x=self.Residual(x,[128,128,256],name="Residual2_1")
            x=self.Residual(x,[128,128,256],name="Residual2_2")
            x=self.Residual(x,[128,128,256],name="Residual2_3")
            x=self.Residual(x,[128,128,256],name="Residual2_4")
            x=self.MaxPool(x,[3,3],[2,2],name="pool2")
            
            x=self.Residual(x,[256,256,512],name="Residual3_1")
            x=self.Residual(x,[256,256,512],name="Residual3_2")
            x=self.Residual(x,[256,256,512],name="Residual3_3")
            x=self.Residual(x,[256,256,512],name="Residual3_4")
            x=self.Residual(x,[256,256,512],name="Residual3_5")
            x=self.Residual(x,[256,256,512],name="Residual3_6")
            x=self.MaxPool(x,[3,3],[2,2],name="pool3")
            
            x=self.Residual(x,[512,512,1024],name="Residual4_1")
            x=self.Residual(x,[512,512,1024],name="Residual4_2")
            x=self.Residual(x,[512,512,1024],name="Residual4_3")
            x=self.Residual(x,[512,512,1024],name="Residual4_4")
            x=self.MaxPool(x,[3,3],[2,2],name="pool3")
            
            x=self.GlobalAvgPool(x,name="GAP")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,512,name="FC1")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,10,activation=None,use_bn=False,name="FC2")
            
            return tf.reshape(x,[-1,10])
        def ResNeXt(self,x):
            x=self.Conv2d(x,64,[7,7],[2,2],name="conv1")
            x=self.MaxPool(x,[3,3],[2,2],name="pool1")
            
            x=self.ResidualX(x,[64,64,128],name="Residual1_1")
            x=self.ResidualX(x,[64,64,128],name="Residual1_2")
            x=self.ResidualX(x,[64,64,128],name="Residual1_3")
            x=self.MaxPool(x,[3,3],[2,2],name="pool2")
            
            x=self.ResidualX(x,[128,128,256],name="Residual2_1")
            x=self.ResidualX(x,[128,128,256],name="Residual2_2")
            x=self.ResidualX(x,[128,128,256],name="Residual2_3")
            x=self.ResidualX(x,[128,128,256],name="Residual2_4")
            x=self.MaxPool(x,[3,3],[2,2],name="pool2")
            
            x=self.ResidualX(x,[256,256,512],name="Residual3_1")
            x=self.ResidualX(x,[256,256,512],name="Residual3_2")
            x=self.ResidualX(x,[256,256,512],name="Residual3_3")
            x=self.ResidualX(x,[256,256,512],name="Residual3_4")
            x=self.ResidualX(x,[256,256,512],name="Residual3_5")
            x=self.ResidualX(x,[256,256,512],name="Residual3_6")
            x=self.MaxPool(x,[3,3],[2,2],name="pool3")
            
            x=self.ResidualX(x,[512,512,1024],name="Residual4_1")
            x=self.ResidualX(x,[512,512,1024],name="Residual4_2")
            x=self.ResidualX(x,[512,512,1024],name="Residual4_3")
            x=self.ResidualX(x,[512,512,1024],name="Residual4_4")
            x=self.MaxPool(x,[3,3],[2,2],name="pool3")
            
            x=self.GlobalAvgPool(x,name="GAP")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,512,name="FC1")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,10,activation=None,use_bn=False,name="FC2")
            
            return tf.reshape(x,[-1,10])
        def VGG19(self,x):
            x=self.Conv2d(x,64,[3,3],[1,1],name="conv1_1")
            x=self.Conv2d(x,64,[3,3],[1,1],name="conv1_2")
            x=self.Maxpool2d(x,[3,3],[2,2],name="pool1")
            x=self.SENet(x,name="SE_1")
            
            x=self.Conv2d(x,128,[3,3],[1,1],name="conv2_1")
            x=self.Conv2d(x,128,[3,3],[1,1],name="conv2_2")
            x=self.Maxpool2d(x,[3,3],[2,2],name="pool2")
            x=self.SENet(x,name="SE_2")
            
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv3_1")
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv3_2")
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv3_3")
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv3_4")
            x=self.Maxpool2d(x,[3,3],[2,2],name="pool3")
            x=self.SENet(x,name="SE_3")
            
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv4_1")
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv4_2")
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv4_3")
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv4_4")
            x=self.Maxpool2d(x,[3,3],[2,2],name="pool4")
            x=self.SENet(x,name="SE_4")
            
            x=self.Conv2d(x,512,[3,3],[1,1],name="conv5_1")
            x=self.Conv2d(x,512,[3,3],[1,1],name="conv5_2")
            x=self.Conv2d(x,512,[3,3],[1,1],name="conv5_3")
            x=self.Conv2d(x,512,[3,3],[1,1],name="conv5_4")
            x=self.Maxpool2d(x,[3,3],[2,2],name="pool5")
            
            x=self.GlobalAveragPool(x,name="GAP")
            
            x=self.FC_conv(x,128,name="FC_1")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,10,activation=None,bn=False,name="FC_2")
            
            return tf.reshape(x,[-1,10])
        def AlexNet(self,x):
            x=self.Conv2d(x,96,[11,11],[4,4],padding="valid",name="conv_1")
            x=self.Maxpool2d(x,[3,3],[2,2],padding="valid",name="pool_1")
            x=self.SENet(x,name="SE_1")
            
            x=self.Conv2d(x,256,[5,5],[2,2],name="conv_2")
            x=self.Maxpool2d(x,[3,3],[2,2],name="pool_2")
            x=self.SENet(x,name="SE_2")
            
            x=self.Conv2d(x,512,[3,3],[1,1],name="conv_3")
            x=self.SENet(x,name="SE_3")
            
            x=self.Conv2d(x,384,[3,3],[1,1],name="conv_4")
            x=self.SENet(x,name="SE_4")
            
            x=self.Conv2d(x,256,[3,3],[1,1],name="conv_5")
            x=self.Maxpool2d(x,[3,3],[2,2],name="pool_3")
            x=self.SENet(x,name="SE_5")
            
            x=self.GlobalAveragPool(x,name="GAP")
            x=self.SENet(x,name="SE_6")
            
            x=self.FC_conv(x,512,name="FC_1")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,512,name="FC_2")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,10,activation=None,bn=False,name="FC_3")
            return tf.reshape(x,[-1,10])
    x=tf.placeholder(tf.float32,[None,128,128,3])
    y=tf.placeholder(tf.float32,[None,10])
    is_training=tf.placeholder(tf.bool)
    global_step=tf.Variable(0,trainable=False)
    train_epochs=160
    batch_size=64
    batch_num=int(50000/batch_size)
    SE_rate=32
    cardinality=8
    regularizer_rate=0.01
    drop_rate=0.5
    #数据输入
    data=DatasetReader(data_path,[128,128])
    train_batch=data.train_batch(batch_size=batch_size)
    test_batch=data.test_batch(batch_size=256)
    #学习率
    warmup_step=batch_num*3
    learning_rate_base=0.003
    learning_rate_decay=0.97
    learning_rate_step=batch_num
    learning_rate=exponential_decay_with_warmup(warmup_step,learning_rate_base,global_step,learning_rate_step,learning_rate_decay)
    #前向预测
    forward=ConvNet(training=is_training,regularizer_rate=regularizer_rate,drop_rate=drop_rate,SE_rate=SE_rate,cardinality=cardinality).ResNeXt(x)
    #滑动平均&预测
    ema_decay=0.999
    ema=tf.train.ExponentialMovingAverage(ema_decay,global_step)
    ema_op=ema.apply(tf.trainable_variables())
    ema_forward=ConvNet(SE_rate=SE_rate,cardinality=cardinality).ResNeXt(x)
    prediction=tf.nn.softmax(ema_forward)
    correct_pred=tf.equal(tf.argmax(prediction,1),tf.argmax(y,1))
    accuracy=tf.reduce_mean(tf.cast(correct_pred,tf.float32))
    #优化器
    momentum=0.9
    update_ops=tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    cross_entropy=tf.losses.softmax_cross_entropy(onehot_labels=y,logits=forward,label_smoothing=0.1)
    l2_regularizer_loss=tf.add_n(tf.get_collection("losses"))#l2正则化损失
    loss_function=cross_entropy+l2_regularizer_loss
    update_ops=tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        optimizer=tf.train.MomentumOptimizer(learning_rate,momentum).minimize(loss_function,global_step=global_step)
    #模型保存地址
    if not os.path.exists(mode_path):
        os.makedirs(mode_path)
    saver=tf.train.Saver(tf.global_variables())
    

    网络训练:

    gpu_options = tf.GPUOptions(allow_growth=True)
    loss_list,acc_list=[5.0],[0.0]
    step,decay,acc,loss=0,0,0,0
    maxacc,minloss=0.5,1e5
    with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
        print("train start")
        sess.run(data.global_variables_initializer())
        for epoch in range(train_epochs):
            bar=tqdm(range(batch_num),unit=" step",desc="epoch %d:"%(epoch+1))
            bar.set_postfix({"acc:":acc,"loss:":loss})
            for batch in bar:
                images,labels=sess.run(train_batch)
                sess.run(optimizer,feed_dict={x:images,y:labels,is_training:True})
                step+=1
                if step%50==0:
                    decay=min(0.9,(step/50+1)/(step/50+10))
                    images,labels=sess.run(test_batch)
                    acc,loss=sess.run([accuracy,loss_function],feed_dict={x:images,y:labels,is_training:False})
                    acc_list.append(decay*acc_list[-1]+(1-decay)*acc)
                    loss_list.append(decay*loss_list[-1]+(1-decay)*loss)
                    bar.set_postfix({"acc:":acc,"loss:":loss})
            print("update model with acc:%.3f%%"%(acc_list[-1]*100))  
            #pred=sess.run(prediction,feed_dict={x:images,is_training:False})
            #print(pred)
    

    可视化损失和准确率变化曲线

    plt.plot(acc_list)
    plt.legend(["ResNet"],loc="lower right")
    plt.show()
    plt.plot(loss_list)
    plt.legend(["ResNet"],loc="upper right")
    np.save("ResNet_acc.npy",acc_list)
    np.save("ResNet_loss.npy",loss_list)
    

    用模型来进行预测:

    #raw_image std_image是本地图片,std_image输入网络,raw_image为原图片 
    image_path="C:\\Users\\Shijunfeng\\tensorflow_gpu\\dataseats\\kaggle\\Img"
    image_list=[os.path.join(image_path,name)for name in os.listdir(image_path)]
    def imshows(classes,images,labels,index,amount,predictions=None):#显示图片和对应标签
        fig=plt.gcf()
        fig.set_size_inches(10,50)#大小看怎么调整合适把
        for i in range(amount):
            title="lab:"+classes[np.argmax(labels[index+i])]
            if predictions is not None:
                title=title+"prd:"+name[np.argmax(predictions[index+i])]
            ax=plt.subplot(15,2,i+1)#每行五个,输出6行
            ax.set_title(title)
            ax.set_xticks([])
            ax.set_yticks([])
            ax.imshow(images[index+i])
        plt.show()
    def decode(image):
        image=Image.open(image)
        image=tf.image.resize_images(image,[128,128],method=2)
        image=tf.image.per_image_standardization(image)
        return image
    raw_images=[Image.open(image)for image in image_list]
    std_images=[decode(image)for image in image_list]
    gpu_options = tf.GPUOptions(allow_growth=True)
    with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess: #加载模型,这里网络定义部分同样需要
        sess.run(data.global_variables_initializer())
        ckp_state=tf.train.get_checkpoint_state(mode_path)
        if ckp_state and ckp_state.model_checkpoint_path:
            saver.restore(sess,ckp_state.model_checkpoint_path)
        images=sess.run(std_image)
        labels=sess.run(prediction,feed_dict={x:images,is_training:False})
        imshows(data.classes,images,labels,0,10) #输出前10张照片以及对应标签
    

    最终测试结果:
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述

    在测试集上准确率接近100%(后面过拟合了,目前位置没有解决过拟合问题)
    在训练集上能达到90%以上的准确率
    总共训练耗时81分钟左右
    ResNet101
    疯狂堆残差快

      def ResNet(self,x):
            x=self.Conv2d(x,64,[7,7],[2,2],name="conv1")
            x=self.MaxPool(x,[3,3],[2,2],name="pool1")
            
            x=self.Residual(x,[64,64,256],name="Residual1_1")
            x=self.Residual(x,[64,64,256],name="Residual1_2")
            x=self.Residual(x,[64,64,256],name="Residual1_3")
            x=self.MaxPool(x,[3,3],[2,2],name="pool2")
            
            x=self.Residual(x,[128,128,512],name="Residual2_1")
            x=self.Residual(x,[128,128,512],name="Residual2_2")
            x=self.Residual(x,[128,128,512],name="Residual2_3")
            x=self.Residual(x,[128,128,512],name="Residual2_4")
            x=self.MaxPool(x,[3,3],[2,2],name="pool3")
            
            x=self.Residual(x,[256,256,1024],name="Residual3_1")
            x=self.Residual(x,[256,256,1024],name="Residual3_2")
            x=self.Residual(x,[256,256,1024],name="Residual3_3")
            x=self.Residual(x,[256,256,1024],name="Residual3_4")
            x=self.Residual(x,[256,256,1024],name="Residual3_5")
            x=self.Residual(x,[256,256,1024],name="Residual3_6")
            x=self.Residual(x,[256,256,1024],name="Residual3_7")
            x=self.Residual(x,[256,256,1024],name="Residual3_8")
            x=self.Residual(x,[256,256,1024],name="Residual3_9")
            x=self.Residual(x,[256,256,1024],name="Residual3_10")
            x=self.Residual(x,[256,256,1024],name="Residual3_11")
            x=self.Residual(x,[256,256,1024],name="Residual3_12")
            x=self.Residual(x,[256,256,1024],name="Residual3_13")
            x=self.Residual(x,[256,256,1024],name="Residual3_14")
            x=self.Residual(x,[256,256,1024],name="Residual3_15")
            x=self.Residual(x,[256,256,1024],name="Residual3_16")
            x=self.Residual(x,[256,256,1024],name="Residual3_17")
            x=self.Residual(x,[256,256,1024],name="Residual3_18")
            x=self.Residual(x,[256,256,1024],name="Residual3_19")
            x=self.Residual(x,[256,256,1024],name="Residual3_20")
            x=self.Residual(x,[256,256,1024],name="Residual3_21")
            x=self.Residual(x,[256,256,1024],name="Residual3_22")
            x=self.Residual(x,[256,256,1024],name="Residual3_23")
            x=self.MaxPool(x,[3,3],[2,2],name="pool4")
            
            x=self.Residual(x,[512,512,2048],name="Residual4_1")
            x=self.Residual(x,[512,512,2048],name="Residual4_2")
            x=self.Residual(x,[512,512,2048],name="Residual4_3")
            x=self.Residual(x,[512,512,2048],name="Residual4_4")
            x=self.MaxPool(x,[3,3],[2,2],name="pool5")
            
            x=self.GlobalAvgPool(x,name="GAP")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,784,name="FC1")
            x=self.Dropout(x)
            
            x=self.FC_conv(x,10,activation=None,use_bn=False,name="output")
            
            return tf.reshape(x,[-1,10])
    

    在这里插入图片描述
    可以看到网络测试集准确率能够稳定在90%以上
    最后20个epoch求平均值得到准确率为90.945%(91%)
    最高准确率为94%(batch_size=256)
    SE-ResNet101
    残差块后面加上SENet子结构即可

        def Residual(self,x,channels,name="Residual"):
            with tf.variable_scope(name,tf.AUTO_REUSE):
                res=self.Conv2d(x,channels[0],[1,1],name="conv_1")
                res=self.Conv2d(res,channels[1],[3,3],name="conv_2")
                res=self.Conv2d(res,channels[2],[1,1],activation=None,use_bn=True,name="conv_3")
                res=self.SENet(res)
                if x.shape.as_list()[-1]!=channels[2]:
                    x=self.Conv2d(x,channels[2],[1,1],activation=None,use_bn=False,name="conv_linear")
                return tf.nn.relu(res+x)
    

    结果暂时未出,待更新

    展开全文
  • (梳理)用Tensorflow实现SE-ResNet(SENet ResNet ResNeXt VGG16)的数据输入,训练,预测的完整代码框架(cifar10准确率90%) balabalabala 之前的代码感觉还是太乱了,我刚开始学的时候还是用的tensorflow1.2 现在2.5都要...

    之前的一些记录
    数据集读取的通用流程
    (梳理)用Tensorflow实现SE-ResNet(SENet ResNet ResNeXt VGG16)的数据输入,训练,预测的完整代码框架(cifar10准确率90%)

    balabalabala

    之前的代码感觉还是太乱了,我刚开始学的时候还是用的tensorflow1.2
    现在2.5都要出来了,就重新梳理了一下
    这次用了Tensorflow1.15,统一使用了tf.keras和tf.data接口
    代码看起来简洁了不少
    tf.keras的模型model是支持直接传入Numpy数据进行预测
    但是在这份代码中我仍然使用了tf.data,方便进行数据预处理和数据增强(主要是数据增强)
    ResNet50+SENet轻量级注意力网络
    数据增强部分使用随机翻转,随即裁剪等常规操作
    另外新加了CutOut数据增强,随机生成几个给定大小的block,来对图片进行遮挡

    class Cutout(object):
        """Randomly mask out one or more patches from an image.
        Args:
            n_holes (int): Number of patches to cut out of each image.
            length (int): The length (in pixels) of each square patch.
        """
        def __init__(self, n_holes, length):
            self.n_holes=n_holes
            self.length=length
        def __call__(self, img):
            """
            Args:
                img (Tensor): Tensor image of size (C, H, W).
            Returns:
                Tensor: Image with n_holes of dimension length x length cut out of it.
            """
            h=img.shape[0]
            w=img.shape[1]
            mask=np.ones((h, w), np.float32)
            for n in range(self.n_holes):
                y=np.random.randint(h)
                x=np.random.randint(w)
                y1=np.clip(y-self.length//2,0,h)
                y2=np.clip(y+self.length//2,0,h)
                x1=np.clip(x-self.length//2,0,w)
                x2=np.clip(x+self.length//2,0,w)
                mask[y1: y2, x1: x2] = 0.
            mask=np.expand_dims(mask,axis=-1)
            img=img*mask
            return img
    

    数据预处理+数据增强

    class DatasetReader(object):
        def __init__(self,data_path,image_size=None):
            self.data_path=data_path
            self.img_size=image_size
            self.img_size.append(3)
            self.train_path=os.path.join(data_path,"train")
            self.test_path=os.path.join(data_path,"test")
            self.TF_path=os.path.join(data_path,"TFRecordData")
            self.tf_train_path=os.path.join(self.TF_path,"train")
            self.tf_test_path=os.path.join(self.TF_path,"test")
            self.classes=os.listdir(self.train_path)
            self.__Makedirs()
            self.train_batch_initializer=None
            self.test_batch_initializer=None
            self.__CreateTFRecord(self.train_path,self.tf_train_path) 
            self.__CreateTFRecord(self.test_path,self.tf_test_path)
        def __CreateTFRecord(self,read_path,save_path):#创建TFRecord文件
            path=os.path.join(save_path,"data.TFRecord")
            if os.path.exists(path):
                print("find file "+(os.path.join(save_path,"data.TFRecords")))
                return
            else:
                print("cannot find file %s,ready to recreate"%(os.path.join(save_path,"data.TFRecords")))
            writer=tf.python_io.TFRecordWriter(path=path)
            image_path=[]
            image_label=[]
            for label,class_name in enumerate(self.classes):
                class_path=os.path.join(read_path,class_name)
                for image_name in os.listdir(class_path):
                    image_path.append(os.path.join(class_path,image_name))
                    image_label.append(label)
            for i in range(5):image_path,image_label=shuffle(image_path,image_label)
            bar=Progbar(len(image_path))
            for i in range(len(image_path)):
                image,label=Image.open(image_path[i]).convert("RGB"),image_label[i]
                image=image.convert("RGB")
                image=image.tobytes()
                example=tf.train.Example(features=tf.train.Features(feature={
                            "label":tf.train.Feature(int64_list=tf.train.Int64List(value=[label])),
                            "image":tf.train.Feature(bytes_list=tf.train.BytesList(value=[image]))
                        }))
                writer.write(example.SerializeToString())
                bar.update(i+1)
            writer.close()
        def __Makedirs(self):
            if not os.path.exists(self.TF_path):
                os.makedirs(self.TF_path)
            if not os.path.exists(self.tf_train_path):
                os.makedirs(self.tf_train_path)
            if not os.path.exists(self.tf_test_path):
                os.makedirs(self.tf_test_path)
        def __parsed(self,tensor):
            feature=tf.parse_single_example(tensor,features={
                        "image":tf.FixedLenFeature([],tf.string),
                        "label":tf.FixedLenFeature([],tf.int64)
                })
            image=tf.decode_raw(feature["image"],tf.uint8)
            image=tf.reshape(image,self.img_size)
            image=tf.random_crop(image,self.img_size)
            image=image/255
            
            label=tf.cast(feature["label"],tf.int32)
            label=tf.one_hot(label,len(self.classes))
            return image,label
        def __parsed_distorted(self,tensor):
    
            feature=tf.parse_single_example(tensor,features={
                        "image":tf.FixedLenFeature([],tf.string),
                        "label":tf.FixedLenFeature([],tf.int64)
                })
            image=tf.decode_raw(feature["image"],tf.uint8)
            image=tf.reshape(image,self.img_size)
    
            image=tf.image.random_flip_left_right(image)
            image=tf.image.random_flip_up_down(image)
            image=tf.image.random_brightness(image,max_delta=0.2)
            image=tf.image.random_hue(image,max_delta=0.2)
            image=tf.image.random_contrast(image,lower=0.75,upper=1.25)
            image=tf.image.random_saturation(image,lower=0.75,upper=1.25)
            image=image/255
            
            random=tf.random_uniform(shape=[],minval=0,maxval=1,dtype=tf.float32)
            random_shape=tf.random_uniform(shape=[2],minval=32,maxval=48,dtype=tf.int32)
            
            croped_image=tf.image.resize_image_with_crop_or_pad(image,40,40)
            croped_image=tf.image.resize_images(croped_image,random_shape,method=2)
            
            croped_image=tf.random_crop(croped_image,[32,32,3])
            
            image=tf.cond(random<0.25,lambda:image,lambda:croped_image)
            image=tf.reshape(image,[32,32,3])
    
            random=tf.random_uniform(shape=[],minval=0,maxval=1,dtype=tf.float32)
            cut_image=tf.py_func(Cutout(2,8),[image],tf.float32)
            image=tf.cond(random<0.25,lambda:image,lambda:cut_image)
            image=tf.reshape(image,[32,32,3])
            
            image=tf.reshape(image,shape=[32,32,3])
            label=tf.cast(feature["label"],tf.int32)
            label=tf.one_hot(label,len(self.classes))
            return image,label
        def __get_dataset(self,path,parsed,batch_size,buffer_size=10000):
            filename=[os.path.join(path,name)for name in os.listdir(path)]
            dataset=tf.data.TFRecordDataset(filename)
            dataset=dataset.prefetch(tf.contrib.data.AUTOTUNE)
            dataset=dataset.shuffle(buffer_size=buffer_size,seed=19260817)
            dataset=dataset.repeat(count=None)
            dataset=dataset.map(parsed,num_parallel_calls=16)
            dataset=dataset.batch(batch_size=batch_size)
            dataset=dataset.apply(tf.data.experimental.prefetch_to_device("/gpu:0"))
            return dataset
        def global_variables_initializer(self):
            initializer=[]
            initializer.append(self.train_batch_initializer)
            initializer.append(self.test_batch_initializer)
            initializer.append(tf.global_variables_initializer())
            return initializer
        def test_batch(self,batch_size):
            dataset=self.__get_dataset(self.tf_test_path,self.__parsed,batch_size)
            return dataset
        def train_batch(self,batch_size):
            dataset=self.__get_dataset(self.tf_train_path,self.__parsed_distorted,batch_size)
            return dataset
    

    这部分代码很长,也是很久前就写好就没动过了hhh
    大致流程就是
    1.根据图片数据集文件夹创建一个TFRecord文件
    数据集文件夹中每个类别分别再建立一个文件夹并放置图片
    如下为CIFAR10数据集的文件夹
    在这里插入图片描述
    2.读取TFRecord文件,生成一个dataset对象
    然后对于每一个拿去训练的batch,进行数据增强
    随机大小变换,随机翻转,随机噪声,CutOut等等,当然训练集需要数据增强,验证集并不需要
    3.得到用于训练的dataset对象(tf.data.TFRecordDataset).
    tf.keras允许直接传入这个dataset对象(1.15)
    1.14的话需要用迭代器(iterator=ds.make_one_shot_iterator())
    更早的版本,传入不支持迭代器的话,那就进一步传入一个Tensor,这个Tensor每次调用会得到不同的一个batch的数据(batch=iterator.get_next()),但是印象中这个只能迭代一轮
    但是tf.keras的话可以用这个没问题的
    多次迭代的话可以使用iterator=dataset.make_initializable_iterator().get_next()
    同时使用之前需要先运行sess.run(iterator.initializer)来初始化
    4.传入model.fit进行训练,这一步下面说

    网络结构

    基本结构单元

    CBR

    Conv-BN-Relu结构,非常经典的结构了

    class Convolution2D(object):
        def __init__(self,filters,ksize,strides=1,padding="valid",activation=None,kernel_regularizer=None):
            self.filters=filters
            self.kernel_regularizer=kernel_regularizer
            self.ksize=ksize
            self.strides=strides
            self.padding=padding
            self.activation=activation
        def __call__(self,x):
            x=Conv2D(self.filters,self.ksize,self.strides,padding=self.padding,kernel_regularizer=self.kernel_regularizer)(x)
            x=BatchNormalization()(x)
            x=Activation(self.activation)(x)
            return x
    

    SENet

    在这里插入图片描述
    一种轻量级通道注意力机制
    SE模块首先对卷积得到的特征图进行Squeeze操作,得到channel级的全局特征,然后对全局特征进行Excitation操作,学习各个channel间的关系,也得到不同channel的权重,最后乘以原来的特征图得到最终特征。本质上,SE模块是在channel维度上做attention或者gating操作,这种注意力机制让模型可以更加关注信息量最大的channel特征,而抑制那些不重要的channel特征。另外一点是SE模块是通用的,这意味着其可以嵌入到现有的网络架构中。

    class SEBlock(object):
        def __init__(self,se_rate=16,l2_rate=0.001):
            self.se_rate=se_rate
            self.l2_rate=l2_rate
        def __call__(self,inputs):
            shape=inputs.shape
            squeeze=GlobalAveragePooling2D()(inputs)
            squeeze=Dense(shape[-1]//self.se_rate,activation="relu")(squeeze)
            extract=Dense(shape[-1],activation=tf.nn.sigmoid)(squeeze)
            extract=tf.expand_dims(extract,axis=1)
            extract=tf.expand_dims(extract,axis=1)
            output=tf.keras.layers.multiply([extract,inputs])
            return output
    

    ResNet

    在这里插入图片描述
    非常经典的结构,就不做过多说明了

    class Residual(object):
        def __init__(self,filters,kernel_size,strides=1,padding='valid',activation=None):
            self.filters=filters
            self.kernel_size=kernel_size
            self.strides=strides
            self.padding=padding
            self.activation=activation
        def __call__(self,inputs):
            x=Convolution2D(filters=self.filters[0],
                            ksize=(1,1),
                            strides=self.strides,
                            padding=self.padding,
                            activation=self.activation)(inputs)
            x=Convolution2D(filters=self.filters[1],
                            ksize=self.kernel_size,
                            strides=self.strides,
                            padding=self.padding,
                            activation=self.activation)(x)
            x=Convolution2D(filters=self.filters[2],
                            ksize=(1,1),
                            strides=self.strides,
                            padding=self.padding,
                            activation=None)(x)
            if x.shape.as_list()[-1]!=inputs.shape.as_list()[-1]:
                inputs=Convolution2D(filters=self.filters[2],
                                ksize=(1,1),
                                strides=self.strides,
                                padding=self.padding,
                                activation=None)(inputs)
            x=tf.keras.layers.add([inputs,x])
            x=tf.keras.layers.Activation(self.activation)(x)
            return x
    

    SE+Res

    残差之前加上SENet通道注意力机制

    class Residual(object):
        def __init__(self,filters,kernel_size,strides=1,padding='valid',activation=None):
            self.filters=filters
            self.kernel_size=kernel_size
            self.strides=strides
            self.padding=padding
            self.activation=activation
        def __call__(self,inputs):
            x=Convolution2D(filters=self.filters[0],
                            ksize=(1,1),
                            strides=self.strides,
                            padding=self.padding,
                            activation=self.activation)(inputs)
            x=Convolution2D(filters=self.filters[1],
                            ksize=self.kernel_size,
                            strides=self.strides,
                            padding=self.padding,
                            activation=self.activation)(x)
            x=Convolution2D(filters=self.filters[2],
                            ksize=(1,1),
                            strides=self.strides,
                            padding=self.padding,
                            activation=None)(x)
            x=SEBlock(se_rate=16)(x) #在残差add之前加上SENet即可,非常方便hhh
            if x.shape.as_list()[-1]!=inputs.shape.as_list()[-1]:
                inputs=Convolution2D(filters=self.filters[2],
                                ksize=(1,1),
                                strides=self.strides,
                                padding=self.padding,
                                activation=None)(inputs)
            x=tf.keras.layers.add([inputs,x])
            x=tf.keras.layers.Activation(self.activation)(x)
            return x
    

    结构可视化

    待补充

    完整网络结构的代码

    import tensorflow as tf
    from tensorflow.keras.layers import Flatten
    from tensorflow.keras.layers import Input
    from tensorflow.keras.layers import Conv2D
    from tensorflow.keras.layers import MaxPooling2D
    from tensorflow.keras.layers import GlobalAveragePooling2D
    from tensorflow.keras.layers import Dropout
    from tensorflow.keras.layers import AveragePooling2D
    from tensorflow.keras.layers import BatchNormalization
    from tensorflow.keras.layers import Activation
    from tensorflow.keras.models import Model
    from tensorflow.keras.layers import Softmax
    from tensorflow.keras.layers import Dense
    from tensorflow.keras.regularizers import l2
    class SEBlock(object):
        def __init__(self,se_rate=16,l2_rate=0.001):
            self.se_rate=se_rate
            self.l2_rate=l2_rate
        def __call__(self,inputs):
            shape=inputs.shape
            squeeze=GlobalAveragePooling2D()(inputs)
            squeeze=Dense(shape[-1]//self.se_rate,activation="relu")(squeeze)
            extract=Dense(shape[-1],activation=tf.nn.sigmoid)(squeeze)
            extract=tf.expand_dims(extract,axis=1)
            extract=tf.expand_dims(extract,axis=1)
            output=tf.keras.layers.multiply([extract,inputs])
            return output
    class Convolution2D(object):
        def __init__(self,filters,ksize,strides=1,padding="valid",activation=None,kernel_regularizer=None):
            self.filters=filters
            self.kernel_regularizer=kernel_regularizer
            self.ksize=ksize
            self.strides=strides
            self.padding=padding
            self.activation=activation
        def __call__(self,x):
            x=Conv2D(self.filters,self.ksize,self.strides,padding=self.padding,kernel_regularizer=self.kernel_regularizer)(x)
            x=BatchNormalization()(x)
            x=Activation(self.activation)(x)
            return x
    class Residual(object):
        def __init__(self,filters,kernel_size,strides=1,padding='valid',activation=None):
            self.filters=filters
            self.kernel_size=kernel_size
            self.strides=strides
            self.padding=padding
            self.activation=activation
        def __call__(self,inputs):
            x=Convolution2D(filters=self.filters[0],
                            ksize=(1,1),
                            strides=self.strides,
                            padding=self.padding,
                            activation=self.activation)(inputs)
            x=Convolution2D(filters=self.filters[1],
                            ksize=self.kernel_size,
                            strides=self.strides,
                            padding=self.padding,
                            activation=self.activation)(x)
            x=Convolution2D(filters=self.filters[2],
                            ksize=(1,1),
                            strides=self.strides,
                            padding=self.padding,
                            activation=None)(x)
            x=SEBlock(se_rate=16)(x)
            if x.shape.as_list()[-1]!=inputs.shape.as_list()[-1]:
                inputs=Convolution2D(filters=self.filters[2],
                                ksize=(1,1),
                                strides=self.strides,
                                padding=self.padding,
                                activation=None)(inputs)
            x=tf.keras.layers.add([inputs,x])
            x=tf.keras.layers.Activation(self.activation)(x)
            return x
    class SEResNet50(object):
        def __init__(self,se_rate=16,l2_rate=0,drop_rate=0.25):
            self.se_rate=se_rate
            self.l2_rate=l2_rate
            self.drop_rate=drop_rate
        def __call__(self,inputs):
            x=Convolution2D(32,[3,3],[1,1],activation="relu",padding="same")(inputs)
            
            x=Residual([32,32,128],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([32,32,128],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([32,32,128],[3,3],[1,1],activation="relu",padding="same")(x)
            x=MaxPooling2D([3,3],[2,2],padding="same")(x)
            
            x=Residual([64,64,256],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([64,64,256],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([64,64,256],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Dropout(self.drop_rate)(x)
            x=Residual([64,64,256],[3,3],[1,1],activation="relu",padding="same")(x)
            x=MaxPooling2D([3,3],[2,2],padding="same")(x)
            
            x=Residual([128,128,512],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([128,128,512],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([128,128,512],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([128,128,512],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([128,128,512],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([128,128,512],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Dropout(self.drop_rate)(x)
            x=MaxPooling2D([3,3],[2,2],padding="same")(x)
            
            x=Residual([256,256,1024],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([256,256,1024],[3,3],[1,1],activation="relu",padding="same")(x)
            x=Residual([256,256,1024],[3,3],[1,1],activation="relu",padding="same")(x)
            x=GlobalAveragePooling2D()(x)
            x=Dropout(self.drop_rate)(x)
            
            x=Dense(10,activation="softmax")(x)
            print(self)
            return x
       
    

    加载CIFAR10数据集

    batch_size=128
    data_path="dataseats/cifar10/Image"
    mode_path="dataseats/cifar10/Image/TFMode"
    dataset=DatasetReader(data_path,image_size=[32,32])
    train_dataset=dataset.train_batch(batch_size=batch_size)
    test_dataset=dataset.test_batch(batch_size=batch_size)
    

    定义模型结构

    plateau=ReduceLROnPlateau(monitor="val_acc", #12epoch准确率没有上升,学习率减半
                                    verbose=1,
                                    mode='max',
                                    factor=0.6,
                                    patience=12)
    early_stopping=EarlyStopping(monitor='val_acc',#70epoch准确率没有上升,停止训练
                                       verbose=1,
                                       mode='max',
                                       patience=70)
    checkpoint=ModelCheckpoint(f'SERESNET101_CLASSIFIER_CIFAR10_DA.h5',#保存acc最高的模型
                               monitor='val_acc',
                               verbose=1,
                               mode='max',
                               save_weights_only=False,
                               save_best_only=True)
    inputs=Input(shape=[32,32,3]) #定义输入shape
    outputs=SEResNet50()(inputs)
    model=Model(inputs,outputs)
    

    编译和训练

    model.compile(loss="categorical_crossentropy",
                  optimizer=tf.keras.optimizers.Adam(lr=0.001),
                  metrics=['acc'])
    trained_model=model.fit(
        train_dataset,
        steps_per_epoch=50000//batch_size,
        shuffle=True,
        epochs=300,
        validation_data=test_dataset,
        validation_steps=10000//batch_size,
        callbacks=[plateau,early_stopping,checkpoint]
    )
    

    loss和acc变化曲线

    待补充

    评估和预测

    在训练集上训练模型,然后使用测试集来测试网络的准确率:
    在这里插入图片描述
    在测试集上,准确率可以达到94.05%

    其他的一些补充

    AdamW+L2

    L2有利于防止模型的过拟合,但是Adam+L2并不是一个好选择
    但是可以考虑AdamW+weight_decay
    这里待测试

    使用SGD训练

    Adam通常来说比SGD收敛速度更快,但是最终模型的准确率和泛化能力在CV任务上比不上SGD的解,因此使用SGD或SGDM可能会得到更好的准确率
    这里有空会进行实验对比下结果

    展开全文
  • 如何用Keras从头开始训练一个在CIFAR10准确率达到89%的模型CIFAR10 是一个用于图像识别的经典数据集,包含了10个类型的图片。该数据集有60000张尺寸为 32 x 32 的彩色图片,其中50000张用于训练,10000张用于测试...
  • 基于TensorFlow2利用ResNet18+SENet 实现cifar10分类 training准确率95.66%,test准确率90.77%
  • SVM cifar-10 cifar-10-batcher-py准确率

    千次阅读 2018-11-24 21:34:03
    总结:SVM在训练cifar-10 测试集的最终结果准确率大概在37%左右 KNN在训练cifar-10测试集的最终...apple@appledeMacBook-Pro-2:~/work/cs231n/SVM_Cifar10$ python3 svm_cifar10.py datasets/cifar-10-batches-...
  • 用Resnet ,SENet, Inceptiont网络训练Cifar10 或者Cifar 100. 训练数据:Cifar10 或者 Cifar 100 训练集上准确率:97.11%左右 验证集上准确率:90.22%左右 测试集上准确率:88.6% 训练时间在GPU上:一小时多 权重...
  • •https://zhuanlan.zhihu.com/p/29214791CIFAR10 是一个用于图像识别的经典数据集,包含了10个类型的图片。...[CIFAR10]在几大经典图像识别数据集(MNIST / CIFAR10 / CIFAR100 / STL-10 / SVHN / ImageN...
  • vgg_cifar10.py

    2017-12-14 13:49:20
    vgg_cifar10.py是非常优秀的深度学习卷积神经网络1 cifar10准确率达到了89%。权重文件可在我的下载中找到! 链接:http://download.csdn.net/download/qq_30803353/10158302
  • CIFAR10数据下载地址:http://www.cs.toronto.edu/~kriz/cifar.html,数据大小为162M。 resnet常见的网络结构有...主要分为以下步骤:定义网络结构、训练并测试网络、用测试集检查准确率、显示训练准确率、测试...
  • CIFAR10包含了十个类型的图片,有60000张大小为32x32的彩色图片,其中50000张用于训练,10000张用于测试。数据集共分为5个训练块和1个测试块,每个块有10000个图像,包含以下数据: data——1个数据块中包含1个...
  • 【深度学习】Cifar-10-探究不同的改进策略对分类准确率提高【深度学习】Cifar-10-探究不同的改进策略对分类准确率提高
  • 目前测试在cifar10上训练的准确率至少为91%(由于时间有限,暂时没有做足够测试,只训练了30个epoch) 我的实现中用全局平均池化代替第一层全连接层,第二三层全连接用1x1卷积来代替 每组卷积之后添加BN层,并且卷积通道...
  • # The train/test net protocol buffer definition net: "D:\\CaffeInfo\\D_TrainVal\\cifar10_full_train_test.prototxt" # test_iter specifies how many forward passes the test should carry out. # In the cas
  • Pytorch实战2:ResNet-18实现Cifar-10图像分类 实验环境: Pytorch 0.4.0 torchvision 0.2.1 Python 3.6 CUDA8+cuDNN v7 (可选) Win10+Pycharm 整个项目代码:点击这里 ResNet-18网络结构: ResN...

空空如也

空空如也

1 2 3 4 5 ... 15
收藏数 291
精华内容 116
关键字:

cifar10准确率