精华内容
下载资源
问答
  • 为了提高遥感图像语义分割的效果和分类精度,设计了一种结合ResNet18网络预训练模型的双通道图像特征提取网络。将多重图像特征图进行拼接,融合后的特征图具有更强的特征表达能力。同时,采用批标准化层和带有位置索引...
  • 农村地区遥感图像语义分割是进行城乡规划、植被以及农用地检测的基础。农村地区高分辨率遥感图像含有较为复杂的地物信息,对其进行语义分割难度较大。基于此,提出一种改进的对称编码-解码网络结构SegProNet,利用池化...
  • 基于U-Net模型, 提出了一个全卷积网络(FCN)模型, 用于高分辨率遥感图像语义分割, 其中数据预处理采用了数据标准化和数据增强, 模型训练过程采用Adam优化器, 模型性能评估采用平均Jaccard指数。为提高小类预测的准确...
  • 【Keras】基于SegNet和U-Net的遥感图像语义分割

    万次阅读 多人点赞 2018-03-03 19:51:49
    这两周数据挖掘课期末project我们组选的课题也是遥感图像的语义分割,所以刚好又把前段时间做的成果重新整理和加强了一下,故写了这篇文章,记录一下用深度学习做遥感图像语义分割的完整流程以及一些好的思路和技巧...

    from:【Keras】基于SegNet和U-Net的遥感图像语义分割

    上两个月参加了个比赛,做的是对遥感高清图像做语义分割,美其名曰“天空之眼”。这两周数据挖掘课期末project我们组选的课题也是遥感图像的语义分割,所以刚好又把前段时间做的成果重新整理和加强了一下,故写了这篇文章,记录一下用深度学习做遥感图像语义分割的完整流程以及一些好的思路和技巧。

     

    数据集

    首先介绍一下数据,我们这次采用的数据集是CCF大数据比赛提供的数据(2015年中国南方某城市的高清遥感图像),这是一个小数据集,里面包含了5张带标注的大尺寸RGB遥感图像(尺寸范围从3000×3000到6000×6000),里面一共标注了4类物体,植被(标记1)、建筑(标记2)、水体(标记3)、道路(标记4)以及其他(标记0)。其中,耕地、林地、草地均归为植被类,为了更好地观察标注情况,我们将其中三幅训练图片可视化如下:蓝色-水体,黄色-房屋,绿色-植被,棕色-马路。更多数据介绍可以参看这里

    现在说一说我们的数据处理的步骤。我们现在拥有的是5张大尺寸的遥感图像,我们不能直接把这些图像送入网络进行训练,因为内存承受不了而且他们的尺寸也各不相同。因此,我们首先将他们做随机切割,即随机生成x,y坐标,然后抠出该坐标下256*256的小图,并做以下数据增强操作:

    1. 原图和label图都需要旋转:90度,180度,270度
    2. 原图和label图都需要做沿y轴的镜像操作
    3. 原图做模糊操作
    4. 原图做光照调整操作
    5. 原图做增加噪声操作(高斯噪声,椒盐噪声)

    这里我没有采用Keras自带的数据增广函数,而是自己使用opencv编写了相应的增强函数。

    img_w = 256  
    img_h = 256  
    
    image_sets = ['1.png','2.png','3.png','4.png','5.png']
    
    def gamma_transform(img, gamma):
        gamma_table = [np.power(x / 255.0, gamma) * 255.0 for x in range(256)]
        gamma_table = np.round(np.array(gamma_table)).astype(np.uint8)
        return cv2.LUT(img, gamma_table)
    
    def random_gamma_transform(img, gamma_vari):
        log_gamma_vari = np.log(gamma_vari)
        alpha = np.random.uniform(-log_gamma_vari, log_gamma_vari)
        gamma = np.exp(alpha)
        return gamma_transform(img, gamma)
        
    
    def rotate(xb,yb,angle):
        M_rotate = cv2.getRotationMatrix2D((img_w/2, img_h/2), angle, 1)
        xb = cv2.warpAffine(xb, M_rotate, (img_w, img_h))
        yb = cv2.warpAffine(yb, M_rotate, (img_w, img_h))
        return xb,yb
        
    def blur(img):
        img = cv2.blur(img, (3, 3));
        return img
    
    def add_noise(img):
        for i in range(200): #添加点噪声
            temp_x = np.random.randint(0,img.shape[0])
            temp_y = np.random.randint(0,img.shape[1])
            img[temp_x][temp_y] = 255
        return img
        
        
    def data_augment(xb,yb):
        if np.random.random() < 0.25:
            xb,yb = rotate(xb,yb,90)
        if np.random.random() < 0.25:
            xb,yb = rotate(xb,yb,180)
        if np.random.random() < 0.25:
            xb,yb = rotate(xb,yb,270)
        if np.random.random() < 0.25:
            xb = cv2.flip(xb, 1)  # flipcode > 0:沿y轴翻转
            yb = cv2.flip(yb, 1)
            
        if np.random.random() < 0.25:
            xb = random_gamma_transform(xb,1.0)
            
        if np.random.random() < 0.25:
            xb = blur(xb)
        
        if np.random.random() < 0.2:
            xb = add_noise(xb)
            
        return xb,yb
    
    def creat_dataset(image_num = 100000, mode = 'original'):
        print('creating dataset...')
        image_each = image_num / len(image_sets)
        g_count = 0
        for i in tqdm(range(len(image_sets))):
            count = 0
            src_img = cv2.imread('./data/src/' + image_sets[i])  # 3 channels
            label_img = cv2.imread('./data/label/' + image_sets[i],cv2.IMREAD_GRAYSCALE)  # single channel
            X_height,X_width,_ = src_img.shape
            while count < image_each:
                random_width = random.randint(0, X_width - img_w - 1)
                random_height = random.randint(0, X_height - img_h - 1)
                src_roi = src_img[random_height: random_height + img_h, random_width: random_width + img_w,:]
                label_roi = label_img[random_height: random_height + img_h, random_width: random_width + img_w]
                if mode == 'augment':
                    src_roi,label_roi = data_augment(src_roi,label_roi)
                
                visualize = np.zeros((256,256)).astype(np.uint8)
                visualize = label_roi *50
                
                cv2.imwrite(('./aug/train/visualize/%d.png' % g_count),visualize)
                cv2.imwrite(('./aug/train/src/%d.png' % g_count),src_roi)
                cv2.imwrite(('./aug/train/label/%d.png' % g_count),label_roi)
                count += 1 
                g_count += 1

    经过上面数据增强操作后,我们得到了较大的训练集:100000张256*256的图片。

    卷积神经网络

    面对这类图像语义分割的任务,我们可以选取的经典网络有很多,比如FCN,U-Net,SegNet,DeepLab,RefineNet,Mask Rcnn,Hed Net这些都是非常经典而且在很多比赛都广泛采用的网络架构。所以我们就可以从中选取一两个经典网络作为我们这个分割任务的解决方案。我们根据我们小组的情况,选取了U-Net和SegNet作为我们的主体网络进行实验。

    SegNet

    SegNet已经出来好几年了,这不是一个最新、效果最好的语义分割网络,但是它胜在网络结构清晰易懂,训练快速坑少,所以我们也采取它来做同样的任务。SegNet网络结构是编码器-解码器的结构,非常优雅,值得注意的是,SegNet做语义分割时通常在末端加入CRF模块做后处理,旨在进一步精修边缘的分割结果。有兴趣深究的可以看看这里

    现在讲解代码部分,首先我们先定义好SegNet的网络结构。

    def SegNet():  
        model = Sequential()  
        #encoder  
        model.add(Conv2D(64,(3,3),strides=(1,1),input_shape=(3,img_w,img_h),padding='same',activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu'))  
        model.add(BatchNormalization())  
        model.add(MaxPooling2D(pool_size=(2,2)))  
        #(128,128)  
        model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(MaxPooling2D(pool_size=(2, 2)))  
        #(64,64)  
        model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(MaxPooling2D(pool_size=(2, 2)))  
        #(32,32)  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(MaxPooling2D(pool_size=(2, 2)))  
        #(16,16)  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(MaxPooling2D(pool_size=(2, 2)))  
        #(8,8)  
        #decoder  
        model.add(UpSampling2D(size=(2,2)))  
        #(16,16)  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(UpSampling2D(size=(2, 2)))  
        #(32,32)  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(UpSampling2D(size=(2, 2)))  
        #(64,64)  
        model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(UpSampling2D(size=(2, 2)))  
        #(128,128)  
        model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(UpSampling2D(size=(2, 2)))  
        #(256,256)  
        model.add(Conv2D(64, (3, 3), strides=(1, 1), input_shape=(3,img_w, img_h), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation='relu'))  
        model.add(BatchNormalization())  
        model.add(Conv2D(n_label, (1, 1), strides=(1, 1), padding='same'))  
        model.add(Reshape((n_label,img_w*img_h)))  
        #axis=1和axis=2互换位置,等同于np.swapaxes(layer,1,2)  
        model.add(Permute((2,1)))  
        model.add(Activation('softmax'))  
        model.compile(loss='categorical_crossentropy',optimizer='sgd',metrics=['accuracy'])  
        model.summary()  
        return model  

    然后需要读入数据集。这里我们选择的验证集大小是训练集的0.25。

    def get_train_val(val_rate = 0.25):
        train_url = []    
        train_set = []
        val_set  = []
        for pic in os.listdir(filepath + 'src'):
            train_url.append(pic)
        random.shuffle(train_url)
        total_num = len(train_url)
        val_num = int(val_rate * total_num)
        for i in range(len(train_url)):
            if i < val_num:
                val_set.append(train_url[i]) 
            else:
                train_set.append(train_url[i])
        return train_set,val_set
        
    # data for training  
    def generateData(batch_size,data=[]):  
        #print 'generateData...'
        while True:  
            train_data = []  
            train_label = []  
            batch = 0  
            for i in (range(len(data))): 
                url = data[i]
                batch += 1 
                #print (filepath + 'src/' + url)
                #img = load_img(filepath + 'src/' + url, target_size=(img_w, img_h))  
                img = load_img(filepath + 'src/' + url)
                img = img_to_array(img) 
                # print img
                # print img.shape  
                train_data.append(img)  
                #label = load_img(filepath + 'label/' + url, target_size=(img_w, img_h),grayscale=True)
                label = load_img(filepath + 'label/' + url, grayscale=True)
                label = img_to_array(label).reshape((img_w * img_h,))  
                # print label.shape  
                train_label.append(label)  
                if batch % batch_size==0: 
                    #print 'get enough bacth!\n'
                    train_data = np.array(train_data)  
                    train_label = np.array(train_label).flatten()  
                    train_label = labelencoder.transform(train_label)  
                    train_label = to_categorical(train_label, num_classes=n_label)  
                    train_label = train_label.reshape((batch_size,img_w * img_h,n_label))  
                    yield (train_data,train_label)  
                    train_data = []  
                    train_label = []  
                    batch = 0  
     
    # data for validation 
    def generateValidData(batch_size,data=[]):  
        #print 'generateValidData...'
        while True:  
            valid_data = []  
            valid_label = []  
            batch = 0  
            for i in (range(len(data))):  
                url = data[i]
                batch += 1  
                #img = load_img(filepath + 'src/' + url, target_size=(img_w, img_h))
                img = load_img(filepath + 'src/' + url)
                #print img
                #print (filepath + 'src/' + url)
                img = img_to_array(img)  
                # print img.shape  
                valid_data.append(img)  
                #label = load_img(filepath + 'label/' + url, target_size=(img_w, img_h),grayscale=True)
                label = load_img(filepath + 'label/' + url, grayscale=True)
                label = img_to_array(label).reshape((img_w * img_h,))  
                # print label.shape  
                valid_label.append(label)  
                if batch % batch_size==0:  
                    valid_data = np.array(valid_data)  
                    valid_label = np.array(valid_label).flatten()  
                    valid_label = labelencoder.transform(valid_label)  
                    valid_label = to_categorical(valid_label, num_classes=n_label)  
                    valid_label = valid_label.reshape((batch_size,img_w * img_h,n_label))  
                    yield (valid_data,valid_label)  
                    valid_data = []  
                    valid_label = []  
                    batch = 0  

    然后定义一下我们训练的过程,在这个任务上,我们把batch size定为16,epoch定为30,每次都存储最佳model(save_best_only=True),并且在训练结束时绘制loss/acc曲线,并存储起来。

    def train(args): 
        EPOCHS = 30
        BS = 16
        model = SegNet()  
        modelcheck = ModelCheckpoint(args['model'],monitor='val_acc',save_best_only=True,mode='max')  
        callable = [modelcheck]  
        train_set,val_set = get_train_val()
        train_numb = len(train_set)  
        valid_numb = len(val_set)  
        print ("the number of train data is",train_numb)  
        print ("the number of val data is",valid_numb)
        H = model.fit_generator(generator=generateData(BS,train_set),steps_per_epoch=train_numb//BS,epochs=EPOCHS,verbose=1,  
                        validation_data=generateValidData(BS,val_set),validation_steps=valid_numb//BS,callbacks=callable,max_q_size=1)  
    
        # plot the training loss and accuracy
        plt.style.use("ggplot")
        plt.figure()
        N = EPOCHS
        plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
        plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
        plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")
        plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")
        plt.title("Training Loss and Accuracy on SegNet Satellite Seg")
        plt.xlabel("Epoch #")
        plt.ylabel("Loss/Accuracy")
        plt.legend(loc="lower left")
        plt.savefig(args["plot"])

    然后开始漫长的训练,训练时间接近3天,绘制出的loss/acc图如下:

    训练loss降到0.1左右,acc可以去到0.9,但是验证集的loss和acc都没那么好,貌似存在点问题。

    先不管了,先看看预测结果吧。

    这里需要思考一下怎么预测整张遥感图像。我们知道,我们训练模型时选择的图片输入是256×256,所以我们预测时也要采用256×256的图片尺寸送进模型预测。现在我们要考虑一个问题,我们该怎么将这些预测好的小图重新拼接成一个大图呢?这里给出一个最基础的方案:先给大图做padding 0操作,得到一副padding过的大图,同时我们也生成一个与该图一样大的全0图A,把图像的尺寸补齐为256的倍数,然后以256为步长切割大图,依次将小图送进模型预测,预测好的小图则放在A的相应位置上,依次进行,最终得到预测好的整张大图(即A),再做图像切割,切割成原先图片的尺寸,完成整个预测流程。

    def predict(args):
        # load the trained convolutional neural network
        print("[INFO] loading network...")
        model = load_model(args["model"])
        stride = args['stride']
        for n in range(len(TEST_SET)):
            path = TEST_SET[n]
            #load the image
            image = cv2.imread('./test/' + path)
            # pre-process the image for classification
            #image = image.astype("float") / 255.0
            #image = img_to_array(image)
            h,w,_ = image.shape
            padding_h = (h//stride + 1) * stride 
            padding_w = (w//stride + 1) * stride
            padding_img = np.zeros((padding_h,padding_w,3),dtype=np.uint8)
            padding_img[0:h,0:w,:] = image[:,:,:]
            padding_img = padding_img.astype("float") / 255.0
            padding_img = img_to_array(padding_img)
            print 'src:',padding_img.shape
            mask_whole = np.zeros((padding_h,padding_w),dtype=np.uint8)
            for i in range(padding_h//stride):
                for j in range(padding_w//stride):
                    crop = padding_img[:3,i*stride:i*stride+image_size,j*stride:j*stride+image_size]
                    _,ch,cw = crop.shape
                    if ch != 256 or cw != 256:
                        print 'invalid size!'
                        continue
                        
                    crop = np.expand_dims(crop, axis=0)
                    #print 'crop:',crop.shape
                    pred = model.predict_classes(crop,verbose=2)  
                    pred = labelencoder.inverse_transform(pred[0])  
                    #print (np.unique(pred))  
                    pred = pred.reshape((256,256)).astype(np.uint8)
                    #print 'pred:',pred.shape
                    mask_whole[i*stride:i*stride+image_size,j*stride:j*stride+image_size] = pred[:,:]
    
            
            cv2.imwrite('./predict/pre'+str(n+1)+'.png',mask_whole[0:h,0:w])

    预测的效果图如下:

    一眼看去,效果真的不错,但是仔细看一下,就会发现有个很大的问题:拼接痕迹过于明显了!那怎么解决这类边缘问题呢?很直接的想法就是缩小切割时的滑动步伐,比如我们把切割步伐改为128,那么拼接时就会有一般的图像发生重叠,这样做可以尽可能地减少拼接痕迹。

    U-Net

    对于这个语义分割任务,我们毫不犹豫地选择了U-Net作为我们的方案,原因很简单,我们参考很多类似的遥感图像分割比赛的资料,绝大多数获奖的选手使用的都是U-Net模型。在这么多的好评下,我们选择U-Net也就毫无疑问了。

    U-Net有很多优点,最大卖点就是它可以在小数据集上也能train出一个好的模型,这个优点对于我们这个任务来说真的非常适合。而且,U-Net在训练速度上也是非常快的,这对于需要短时间就得出结果的期末project来说也是非常合适。U-Net在网络架构上还是非常优雅的,整个呈现U形,故起名U-Net。这里不打算详细介绍U-Net结构,有兴趣的深究的可以看看论文。

    现在开始谈谈代码细节。首先我们定义一下U-Net的网络结构,这里用的deep learning框架还是Keras。

    注意到,我们这里训练的模型是一个多分类模型,其实更好的做法是,训练一个二分类模型(使用二分类的标签),对每一类物体进行预测,得到4张预测图,再做预测图叠加,合并成一张完整的包含4类的预测图,这个策略在效果上肯定好于一个直接4分类的模型。所以,U-Net这边我们采取的思路就是对于每一类的分类都训练一个二分类模型,最后再将每一类的预测结果组合成一个四分类的结果。

    定义U-Net结构,注意了,这里的loss function我们选了binary_crossentropy,因为我们要训练的是二分类模型。

    def unet():
        inputs = Input((3, img_w, img_h))
    
        conv1 = Conv2D(32, (3, 3), activation="relu", padding="same")(inputs)
        conv1 = Conv2D(32, (3, 3), activation="relu", padding="same")(conv1)
        pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    
        conv2 = Conv2D(64, (3, 3), activation="relu", padding="same")(pool1)
        conv2 = Conv2D(64, (3, 3), activation="relu", padding="same")(conv2)
        pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
    
        conv3 = Conv2D(128, (3, 3), activation="relu", padding="same")(pool2)
        conv3 = Conv2D(128, (3, 3), activation="relu", padding="same")(conv3)
        pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
    
        conv4 = Conv2D(256, (3, 3), activation="relu", padding="same")(pool3)
        conv4 = Conv2D(256, (3, 3), activation="relu", padding="same")(conv4)
        pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
    
        conv5 = Conv2D(512, (3, 3), activation="relu", padding="same")(pool4)
        conv5 = Conv2D(512, (3, 3), activation="relu", padding="same")(conv5)
    
        up6 = concatenate([UpSampling2D(size=(2, 2))(conv5), conv4], axis=1)
        conv6 = Conv2D(256, (3, 3), activation="relu", padding="same")(up6)
        conv6 = Conv2D(256, (3, 3), activation="relu", padding="same")(conv6)
    
        up7 = concatenate([UpSampling2D(size=(2, 2))(conv6), conv3], axis=1)
        conv7 = Conv2D(128, (3, 3), activation="relu", padding="same")(up7)
        conv7 = Conv2D(128, (3, 3), activation="relu", padding="same")(conv7)
    
        up8 = concatenate([UpSampling2D(size=(2, 2))(conv7), conv2], axis=1)
        conv8 = Conv2D(64, (3, 3), activation="relu", padding="same")(up8)
        conv8 = Conv2D(64, (3, 3), activation="relu", padding="same")(conv8)
    
        up9 = concatenate([UpSampling2D(size=(2, 2))(conv8), conv1], axis=1)
        conv9 = Conv2D(32, (3, 3), activation="relu", padding="same")(up9)
        conv9 = Conv2D(32, (3, 3), activation="relu", padding="same")(conv9)
    
        conv10 = Conv2D(n_label, (1, 1), activation="sigmoid")(conv9)
        #conv10 = Conv2D(n_label, (1, 1), activation="softmax")(conv9)
    
        model = Model(inputs=inputs, outputs=conv10)
        model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])
        return model
    

    读取数据的组织方式有一些改动。

    # data for training  
    def generateData(batch_size,data=[]):  
        #print 'generateData...'
        while True:  
            train_data = []  
            train_label = []  
            batch = 0  
            for i in (range(len(data))): 
                url = data[i]
                batch += 1 
                img = load_img(filepath + 'src/' + url)
                img = img_to_array(img) 
                train_data.append(img)  
                label = load_img(filepath + 'label/' + url, grayscale=True) 
                label = img_to_array(label)
                #print label.shape  
                train_label.append(label)  
                if batch % batch_size==0: 
                    #print 'get enough bacth!\n'
                    train_data = np.array(train_data)  
                    train_label = np.array(train_label)  
    
                    yield (train_data,train_label)  
                    train_data = []  
                    train_label = []  
                    batch = 0  
     
    # data for validation 
    def generateValidData(batch_size,data=[]):  
        #print 'generateValidData...'
        while True:  
            valid_data = []  
            valid_label = []  
            batch = 0  
            for i in (range(len(data))):  
                url = data[i]
                batch += 1  
                img = load_img(filepath + 'src/' + url)
                #print img
                img = img_to_array(img)  
                # print img.shape  
                valid_data.append(img)  
                label = load_img(filepath + 'label/' + url, grayscale=True)
                valid_label.append(label)  
                if batch % batch_size==0:  
                    valid_data = np.array(valid_data)  
                    valid_label = np.array(valid_label)  
                    yield (valid_data,valid_label)  
                    valid_data = []  
                    valid_label = []  
                    batch = 0  

    训练:指定输出model名字和训练集位置

    python unet.py --model unet_buildings20.h5 --data ./unet_train/buildings/

    预测单张遥感图像时我们分别使用4个模型做预测,那我们就会得到4张mask(比如下图就是我们用训练好的buildings模型预测的结果),我们现在要将这4张mask合并成1张,那么怎么合并会比较好呢?我思路是,通过观察每一类的预测结果,我们可以从直观上知道哪些类的预测比较准确,那么我们就可以给这些mask图排优先级了,比如:priority:building>water>road>vegetation,那么当遇到一个像素点,4个mask图都说是属于自己类别的标签时,我们就可以根据先前定义好的优先级,把该像素的标签定为优先级最高的标签。代码思路可以参照下面的代码:

    def combind_all_mask():
        for mask_num in tqdm(range(3)):
            if mask_num == 0:
                final_mask = np.zeros((5142,5664),np.uint8)#生成一个全黑全0图像,图片尺寸与原图相同
            elif mask_num == 1:
                final_mask = np.zeros((2470,4011),np.uint8)
            elif mask_num == 2:
                final_mask = np.zeros((6116,3356),np.uint8)
            #final_mask = cv2.imread('final_1_8bits_predict.png',0)
            
            if mask_num == 0:
                mask_pool = mask1_pool
            elif mask_num == 1:
                mask_pool = mask2_pool
            elif mask_num == 2:
                mask_pool = mask3_pool
            final_name = img_sets[mask_num]
            for idx,name in enumerate(mask_pool):
                img = cv2.imread('./predict_mask/'+name,0)
                height,width = img.shape
                label_value = idx+1  #coressponding labels value
                for i in tqdm(range(height)):    #priority:building>water>road>vegetation
                    for j in range(width):
                        if img[i,j] == 255:
                            if label_value == 2:
                                final_mask[i,j] = label_value
                            elif label_value == 3 and final_mask[i,j] != 2:
                                final_mask[i,j] = label_value
                            elif label_value == 4 and final_mask[i,j] != 2 and final_mask[i,j] != 3:
                                final_mask[i,j] = label_value
                            elif label_value == 1 and final_mask[i,j] == 0:
                                final_mask[i,j] = label_value
                            
            cv2.imwrite('./final_result/'+final_name,final_mask)           
                    
                    
    print 'combinding mask...'
    combind_all_mask()            

    模型融合

    集成学习的方法在这类比赛中经常使用,要想获得好成绩集成学习必须做得好。在这里简单谈谈思路,我们使用了两个模型,我们模型也会采取不同参数去训练和预测,那么我们就会得到很多预测MASK图,此时 我们可以采取模型融合的思路,对每张结果图的每个像素点采取投票表决的思路,对每张图相应位置的像素点的类别进行预测,票数最多的类别即为该像素点的类别。正所谓“三个臭皮匠,胜过诸葛亮”,我们这种ensemble的思路,可以很好地去掉一些明显分类错误的像素点,很大程度上改善模型的预测能力。

    少数服从多数的投票表决策略代码:

    import numpy as np
    import cv2
    import argparse
    
    RESULT_PREFIXX = ['./result1/','./result2/','./result3/']
    
    # each mask has 5 classes: 0~4
    
    def vote_per_image(image_id):
        result_list = []
        for j in range(len(RESULT_PREFIXX)):
            im = cv2.imread(RESULT_PREFIXX[j]+str(image_id)+'.png',0)
            result_list.append(im)
            
        # each pixel
        height,width = result_list[0].shape
        vote_mask = np.zeros((height,width))
        for h in range(height):
            for w in range(width):
                record = np.zeros((1,5))
                for n in range(len(result_list)):
                    mask = result_list[n]
                    pixel = mask[h,w]
                    #print('pix:',pixel)
                    record[0,pixel]+=1
               
                label = record.argmax()
                #print(label)
                vote_mask[h,w] = label
        
        cv2.imwrite('vote_mask'+str(image_id)+'.png',vote_mask)
            
    
    vote_per_image(3)

    模型融合后的预测结果:

    可以看出,模型融合后的预测效果确实有较大提升,明显错误分类的像素点消失了。

    额外的思路:GAN

    我们对数据方面思考得更多一些,我们针对数据集小的问题,我们有个想法:使用生成对抗网络去生成虚假的卫星地图,旨在进一步扩大数据集。我们的想法就是,使用这些虚假+真实的数据集去训练网络,网络的泛化能力肯定有更大的提升。我们的想法是根据这篇论文(pix2pix)来展开的,这是一篇很有意思的论文,它主要讲的是用图像生成图像的方法。里面提到了用标注好的卫星地图生成虚假的卫星地图的想法,真的让人耳目一新,我们也想根据该思路,生成属于我们的虚假卫星地图数据集。 Map to Aerial的效果是多么的震撼。

    但是我们自己实现起来的效果却不容乐观(如下图所示,右面那幅就是我们生成的假图),效果不好的原因有很多,标注的问题最大,因为生成的虚假卫星地图质量不好,所以该想法以失败告终,生成的假图也没有拿去做训练。但感觉思路还是可行的,如果给的标注合适的话,还是可以生成非常像的虚假地图。

    总结

    对于这类遥感图像的语义分割,思路还有很多,最容易想到的思路就是,将各种语义分割经典网络都实现以下,看看哪个效果最好,再做模型融合,只要集成学习做得好,效果一般都会很不错的。我们仅靠上面那个简单思路(数据增强,经典模型搭建,集成学习),就已经可以获得比赛的TOP 5%了,当然还有一些tricks可以使效果更进一步提升,这里就不细说了,总的建模思路掌握就行。完整的代码可以在我的github获取。

     

    数据下载:

    链接:https://pan.baidu.com/s/1i6oMukH

    密码:yqj2

    展开全文
  • 【Keras】基于SegNet和U-Net的遥感图像语义分割-附件资源
  • 行业分类-物理装置-一种基于下采样的特征融合遥感图像语义分割方法.zip
  • 一、无人机遥感图像语义分割数据集UAVid UAVid数据集是用于针对城市场景的语义分割任务的UAV视频数据集。它具有几个特点: 语义分割 4K分辨率无人机视频 8种物体类别 街景环境 示例图像: 下载地址(支持...

    一、无人机遥感图像语义分割数据集UAVid

    UAVid数据集是用于针对城市场景的语义分割任务的UAV视频数据集。它具有几个特点:

    • 语义分割
    • 4K分辨率无人机视频
    • 8种物体类别
    • 街景环境

    示例图像:

    下载地址(支持百度网盘下载):https://uavid.nl/

    二、使用UAVid工具包(python)进行标签转换

    1. 下载数据集后,进入数据集中,将文件名进行修改并组织为以下形式,即将数据集上级目录名修改为UAVidDataset,将目录名uavid_train修改为train,其余也按照以下图片修改目录名

    2. 下载工具包

    # 首先进入到你的UAVidDataset路径下
    # 然后依次使用以下命令开始下载工具包并配置
    git clone https://github.com/YeLyuUT/UAVidToolKit.git
    cd UAVidToolKit
    python setup.py build_ext --inplace
     

    完成好后,你的文件组织形式应该为上图的格式。

    3. 标签转化

    在UAVidDataset文件下下使用以下命令

    python UAVidToolKit/prepareTrainIdFiles.py -s train/ -t labelimg/train/
    python UAVidToolKit/prepareTrainIdFiles.py -s valid/ -t labelimg/valid/

    如果顺利的话,文件夹结构为以下形式

    打开labelimg文件夹,里面由train和valid两个文件夹,分别存放的是训练集和验证集的标签图像,可以自行查看一下。

     

    三、编写DataLoader数据加载脚本(使用Pytorch)

    训练集和验证集的数据加载脚本,训练集采用了随机缩放、随机翻转、随机位置裁剪三种数据增强方式

    import torch
    import torch.utils.data
    
    import numpy as np
    import cv2
    import os
    
    train_dirs = ["seq1/", "seq2/", "seq3/", "seq4/", "seq5/", 
                  "seq6/", "seq7/", "seq8/", "seq9/", "seq10/",
                  "seq11/", "seq12/", "seq13/", "seq14/", "seq15/",
                  "seq31/", "seq32/", "seq33/", "seq34/", "seq35/"]
    val_dirs = ["sep16/", "seq17/", "seq18/","seq19/",
                "seq20/", "seq36/", "seq37/"]
    
    class DatasetTrain(torch.utils.data.Dataset):
        def __init__(self, uavid_data_path, uavid_meta_path):
            self.img_dir = uavid_data_path + "/train/"
            self.label_dir = uavid_meta_path + "/labelimg/train/"
    
            self.img_h = 2160
            self.img_w = 3840
    
            self.new_img_h = 1536
            self.new_img_w = 1536
    
            self.examples = []
            for train_dir in train_dirs:
                train_img_dir_path = self.img_dir + train_dir + "Images/"
                label_img__dir_path = self.label_dir + train_dir
    
                file_names = os.listdir(train_img_dir_path)
                for file_name in file_names:
                    img_id = file_name.split(".png")[0]
    
                    img_path = train_img_dir_path + file_name
    
                    label_img_path = label_img__dir_path + "TrainId/" + img_id + ".png"
    
                    example = {}
                    example["img_path"] = img_path
                    example["label_img_path"] = label_img_path
                    example["img_id"] = img_id
                    self.examples.append(example)
    
            self.num_examples = len(self.examples)
    
        def __getitem__(self, index):
            example = self.examples[index]
    
            img_path = example["img_path"]
            img = cv2.imread(img_path, -1) # (shape: (2160, 3840, 3))
            # resize img without interpolation (want the image to still match
            # label_img, which we resize below):
            img = cv2.resize(img, (self.new_img_w, self.new_img_h),
                             interpolation=cv2.INTER_NEAREST) # (shape: (1536, 1536, 3))
    
            label_img_path = example["label_img_path"]
            label_img = cv2.imread(label_img_path, cv2.IMREAD_GRAYSCALE) # (shape: (2160, 3840))
            # resize label_img without interpolation (want the resulting image to
            # still only contain pixel values corresponding to an object class):
            label_img = cv2.resize(label_img, (self.new_img_w, self.new_img_h),
                                   interpolation=cv2.INTER_NEAREST) # (shape: (1536, 1536))
    
            # flip the img and the label with 0.5 probability:
            flip = np.random.randint(low=0, high=2)
            if flip == 1:
                img = cv2.flip(img, 1)
                label_img = cv2.flip(label_img, 1)
    
            ########################################################################
            # randomly scale the img and the label:
            ########################################################################
            scale = np.random.uniform(low=0.7, high=2.0)
            new_img_h = int(scale*self.new_img_h)
            new_img_w = int(scale*self.new_img_w)
    
            # resize img without interpolation (want the image to still match
            # label_img, which we resize below):
            img = cv2.resize(img, (new_img_w, new_img_h),
                             interpolation=cv2.INTER_NEAREST) # (shape: (new_img_h, new_img_w, 3))
    
            # resize label_img without interpolation (want the resulting image to
            # still only contain pixel values corresponding to an object class):
            label_img = cv2.resize(label_img, (new_img_w, new_img_h),
                                   interpolation=cv2.INTER_NEAREST) # (shape: (new_img_h, new_img_w))
            ########################################################################
    
            # # # # # # # # debug visualization START
            # print (scale)
            # print (new_img_h)
            # print (new_img_w)
            #
            # cv2.imshow("test", img)
            # cv2.waitKey(0)
            #
            # cv2.imshow("test", label_img)
            # cv2.waitKey(0)
            # # # # # # # # debug visualization END
    
            ########################################################################
            # select a 768x768 random crop from the img and label:
            ########################################################################
            start_x = np.random.randint(low=0, high=(new_img_w - 768))
            end_x = start_x + 768
            start_y = np.random.randint(low=0, high=(new_img_h - 768))
            end_y = start_y + 768
    
    
            img = img[start_y:end_y, start_x:end_x] # (shape: (768, 768, 3))
            label_img = label_img[start_y:end_y, start_x:end_x] # (shape: (768, 768))
            ########################################################################
    
            # # # # # # # # debug visualization START
            # print (img.shape)
            # print (label_img.shape)
            #
            # cv2.imshow("test", img)
            # cv2.waitKey(0)
            #
            # cv2.imshow("test", label_img)
            # cv2.waitKey(0)
            # # # # # # # # debug visualization END
    
            # normalize the img (with the mean and std for the pretrained ResNet):
            img = img/255.0
            img = img - np.array([0.485, 0.456, 0.406])
            img = img/np.array([0.229, 0.224, 0.225]) # (shape: (768, 768, 3))
            img = np.transpose(img, (2, 0, 1)) # (shape: (3, 768, 768))
            img = img.astype(np.float32)
    
            # convert numpy -> torch:
            img = torch.from_numpy(img) # (shape: (3, 768, 768))
            label_img = torch.from_numpy(label_img) # (shape: (768, 768))
    
            return (img, label_img)
    
        def __len__(self):
            return self.num_examples
    
    class DatasetVal(torch.utils.data.Dataset):
        def __init__(self, uavid_data_path, uavid_meta_path):
            self.img_dir = uavid_data_path + "/valid/"
            self.label_dir = uavid_meta_path + "/labelimg/valid/"
    
            self.img_h = 2160
            self.img_w = 3840
    
            self.new_img_h = 768
            self.new_img_w = 768
    
            self.examples = []
            for val_dir in val_dirs:
                val_img_dir_path = self.img_dir + val_dir + "Images/"
                label_img__dir_path = self.label_dir + val_dir 
    
                file_names = os.listdir(val_img_dir_path)
                for file_name in file_names:
                    img_id = file_name.split(".png")[0]
    
                    img_path = val_img_dir_path + file_name 
    
                    label_img_path = label_img__dir_path + "TrainId/" + img_id + ".png"
                    # label_img = cv2.imread(label_img_path, -1) # (shape: (1024, 2048))
    
                    example = {}
                    example["img_path"] = img_path
                    example["label_img_path"] = label_img_path
                    example["img_id"] = img_id
                    self.examples.append(example)
    
            self.num_examples = len(self.examples)
    
        def __getitem__(self, index):
            example = self.examples[index]
    
            img_id = example["img_id"]
    
            img_path = example["img_path"]
            img = cv2.imread(img_path, -1) # (shape: (2160, 3840, 3))
            # resize img without interpolation (want the image to still match
            # label_img, which we resize below):
            img = cv2.resize(img, (self.new_img_w, self.new_img_h),
                             interpolation=cv2.INTER_NEAREST) # (shape: (768, 768, 3))
    
            label_img_path = example["label_img_path"]
            label_img = cv2.imread(label_img_path, cv2.IMREAD_GRAYSCALE) # (shape: (2160, 3840))
            # resize label_img without interpolation (want the resulting image to
            # still only contain pixel values corresponding to an object class):
            label_img = cv2.resize(label_img, (self.new_img_w, self.new_img_h),
                                   interpolation=cv2.INTER_NEAREST) # (shape: (768, 768))
    
            # # # # # # # # debug visualization START
            # cv2.imshow("test", img)
            # cv2.waitKey(0)
            #
            # cv2.imshow("test", label_img)
            # cv2.waitKey(0)
            # # # # # # # # debug visualization END
    
            # normalize the img (with the mean and std for the pretrained ResNet):
            img = img/255.0
            img = img - np.array([0.485, 0.456, 0.406])
            img = img/np.array([0.229, 0.224, 0.225]) # (shape: (768, 768, 3))
            img = np.transpose(img, (2, 0, 1)) # (shape: (3, 768, 768))
            img = img.astype(np.float32)
    
            # convert numpy -> torch:
            img = torch.from_numpy(img) # (shape: (3, 768, 768))
            label_img = torch.from_numpy(label_img) # (shape: (768, 768))
    
            return (img, label_img, img_id)
    
        def __len__(self):
            return self.num_examples

     

    展开全文
  • 这两周数据挖掘课期末project我们组选的课题也是遥感图像的语义分割,所以刚好又把前段时间做的成果重新整理和加强了一下,故写了这篇文章,记录一下用深度学习做遥感图像语义分割的完整流程以及一些好的思路和技巧...

    from:【Keras】基于SegNet和U-Net的遥感图像语义分割

    上两个月参加了个比赛,做的是对遥感高清图像做语义分割,美其名曰“天空之眼”。这两周数据挖掘课期末project我们组选的课题也是遥感图像的语义分割,所以刚好又把前段时间做的成果重新整理和加强了一下,故写了这篇文章,记录一下用深度学习做遥感图像语义分割的完整流程以及一些好的思路和技巧。

     

    数据集

    首先介绍一下数据,我们这次采用的数据集是CCF大数据比赛提供的数据(2015年中国南方某城市的高清遥感图像),这是一个小数据集,里面包含了5张带标注的大尺寸RGB遥感图像(尺寸范围从3000×3000到6000×6000),里面一共标注了4类物体,植被(标记1)、建筑(标记2)、水体(标记3)、道路(标记4)以及其他(标记0)。其中,耕地、林地、草地均归为植被类,为了更好地观察标注情况,我们将其中三幅训练图片可视化如下:蓝色-水体,黄色-房屋,绿色-植被,棕色-马路。更多数据介绍可以参看这里

    现在说一说我们的数据处理的步骤。我们现在拥有的是5张大尺寸的遥感图像,我们不能直接把这些图像送入网络进行训练,因为内存承受不了而且他们的尺寸也各不相同。因此,我们首先将他们做随机切割,即随机生成x,y坐标,然后抠出该坐标下256*256的小图,并做以下数据增强操作:

    1. 原图和label图都需要旋转:90度,180度,270度
    2. 原图和label图都需要做沿y轴的镜像操作
    3. 原图做模糊操作
    4. 原图做光照调整操作
    5. 原图做增加噪声操作(高斯噪声,椒盐噪声)

    这里我没有采用Keras自带的数据增广函数,而是自己使用opencv编写了相应的增强函数。

     
    1. img_w = 256

    2. img_h = 256

    3.  
    4. image_sets = ['1.png','2.png','3.png','4.png','5.png']

    5.  
    6. def gamma_transform(img, gamma):

    7. gamma_table = [np.power(x / 255.0, gamma) * 255.0 for x in range(256)]

    8. gamma_table = np.round(np.array(gamma_table)).astype(np.uint8)

    9. return cv2.LUT(img, gamma_table)

    10.  
    11. def random_gamma_transform(img, gamma_vari):

    12. log_gamma_vari = np.log(gamma_vari)

    13. alpha = np.random.uniform(-log_gamma_vari, log_gamma_vari)

    14. gamma = np.exp(alpha)

    15. return gamma_transform(img, gamma)

    16.  
    17.  
    18. def rotate(xb,yb,angle):

    19. M_rotate = cv2.getRotationMatrix2D((img_w/2, img_h/2), angle, 1)

    20. xb = cv2.warpAffine(xb, M_rotate, (img_w, img_h))

    21. yb = cv2.warpAffine(yb, M_rotate, (img_w, img_h))

    22. return xb,yb

    23.  
    24. def blur(img):

    25. img = cv2.blur(img, (3, 3));

    26. return img

    27.  
    28. def add_noise(img):

    29. for i in range(200): #添加点噪声

    30. temp_x = np.random.randint(0,img.shape[0])

    31. temp_y = np.random.randint(0,img.shape[1])

    32. img[temp_x][temp_y] = 255

    33. return img

    34.  
    35.  
    36. def data_augment(xb,yb):

    37. if np.random.random() < 0.25:

    38. xb,yb = rotate(xb,yb,90)

    39. if np.random.random() < 0.25:

    40. xb,yb = rotate(xb,yb,180)

    41. if np.random.random() < 0.25:

    42. xb,yb = rotate(xb,yb,270)

    43. if np.random.random() < 0.25:

    44. xb = cv2.flip(xb, 1) # flipcode > 0:沿y轴翻转

    45. yb = cv2.flip(yb, 1)

    46.  
    47. if np.random.random() < 0.25:

    48. xb = random_gamma_transform(xb,1.0)

    49.  
    50. if np.random.random() < 0.25:

    51. xb = blur(xb)

    52.  
    53. if np.random.random() < 0.2:

    54. xb = add_noise(xb)

    55.  
    56. return xb,yb

    57.  
    58. def creat_dataset(image_num = 100000, mode = 'original'):

    59. print('creating dataset...')

    60. image_each = image_num / len(image_sets)

    61. g_count = 0

    62. for i in tqdm(range(len(image_sets))):

    63. count = 0

    64. src_img = cv2.imread('./data/src/' + image_sets[i]) # 3 channels

    65. label_img = cv2.imread('./data/label/' + image_sets[i],cv2.IMREAD_GRAYSCALE) # single channel

    66. X_height,X_width,_ = src_img.shape

    67. while count < image_each:

    68. random_width = random.randint(0, X_width - img_w - 1)

    69. random_height = random.randint(0, X_height - img_h - 1)

    70. src_roi = src_img[random_height: random_height + img_h, random_width: random_width + img_w,:]

    71. label_roi = label_img[random_height: random_height + img_h, random_width: random_width + img_w]

    72. if mode == 'augment':

    73. src_roi,label_roi = data_augment(src_roi,label_roi)

    74.  
    75. visualize = np.zeros((256,256)).astype(np.uint8)

    76. visualize = label_roi *50

    77.  
    78. cv2.imwrite(('./aug/train/visualize/%d.png' % g_count),visualize)

    79. cv2.imwrite(('./aug/train/src/%d.png' % g_count),src_roi)

    80. cv2.imwrite(('./aug/train/label/%d.png' % g_count),label_roi)

    81. count += 1

    82. g_count += 1

    经过上面数据增强操作后,我们得到了较大的训练集:100000张256*256的图片。

    卷积神经网络

    面对这类图像语义分割的任务,我们可以选取的经典网络有很多,比如FCN,U-Net,SegNet,DeepLab,RefineNet,Mask Rcnn,Hed Net这些都是非常经典而且在很多比赛都广泛采用的网络架构。所以我们就可以从中选取一两个经典网络作为我们这个分割任务的解决方案。我们根据我们小组的情况,选取了U-Net和SegNet作为我们的主体网络进行实验。

    SegNet

    SegNet已经出来好几年了,这不是一个最新、效果最好的语义分割网络,但是它胜在网络结构清晰易懂,训练快速坑少,所以我们也采取它来做同样的任务。SegNet网络结构是编码器-解码器的结构,非常优雅,值得注意的是,SegNet做语义分割时通常在末端加入CRF模块做后处理,旨在进一步精修边缘的分割结果。有兴趣深究的可以看看这里

    现在讲解代码部分,首先我们先定义好SegNet的网络结构。

     
    1. def SegNet():

    2. model = Sequential()

    3. #encoder

    4. model.add(Conv2D(64,(3,3),strides=(1,1),input_shape=(3,img_w,img_h),padding='same',activation='relu'))

    5. model.add(BatchNormalization())

    6. model.add(Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu'))

    7. model.add(BatchNormalization())

    8. model.add(MaxPooling2D(pool_size=(2,2)))

    9. #(128,128)

    10. model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    11. model.add(BatchNormalization())

    12. model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    13. model.add(BatchNormalization())

    14. model.add(MaxPooling2D(pool_size=(2, 2)))

    15. #(64,64)

    16. model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    17. model.add(BatchNormalization())

    18. model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    19. model.add(BatchNormalization())

    20. model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    21. model.add(BatchNormalization())

    22. model.add(MaxPooling2D(pool_size=(2, 2)))

    23. #(32,32)

    24. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    25. model.add(BatchNormalization())

    26. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    27. model.add(BatchNormalization())

    28. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    29. model.add(BatchNormalization())

    30. model.add(MaxPooling2D(pool_size=(2, 2)))

    31. #(16,16)

    32. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    33. model.add(BatchNormalization())

    34. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    35. model.add(BatchNormalization())

    36. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    37. model.add(BatchNormalization())

    38. model.add(MaxPooling2D(pool_size=(2, 2)))

    39. #(8,8)

    40. #decoder

    41. model.add(UpSampling2D(size=(2,2)))

    42. #(16,16)

    43. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    44. model.add(BatchNormalization())

    45. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    46. model.add(BatchNormalization())

    47. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    48. model.add(BatchNormalization())

    49. model.add(UpSampling2D(size=(2, 2)))

    50. #(32,32)

    51. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    52. model.add(BatchNormalization())

    53. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    54. model.add(BatchNormalization())

    55. model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    56. model.add(BatchNormalization())

    57. model.add(UpSampling2D(size=(2, 2)))

    58. #(64,64)

    59. model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    60. model.add(BatchNormalization())

    61. model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    62. model.add(BatchNormalization())

    63. model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    64. model.add(BatchNormalization())

    65. model.add(UpSampling2D(size=(2, 2)))

    66. #(128,128)

    67. model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    68. model.add(BatchNormalization())

    69. model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    70. model.add(BatchNormalization())

    71. model.add(UpSampling2D(size=(2, 2)))

    72. #(256,256)

    73. model.add(Conv2D(64, (3, 3), strides=(1, 1), input_shape=(3,img_w, img_h), padding='same', activation='relu'))

    74. model.add(BatchNormalization())

    75. model.add(Conv2D(64, (3, 3), strides=(1, 1), padding='same', activation='relu'))

    76. model.add(BatchNormalization())

    77. model.add(Conv2D(n_label, (1, 1), strides=(1, 1), padding='same'))

    78. model.add(Reshape((n_label,img_w*img_h)))

    79. #axis=1和axis=2互换位置,等同于np.swapaxes(layer,1,2)

    80. model.add(Permute((2,1)))

    81. model.add(Activation('softmax'))

    82. model.compile(loss='categorical_crossentropy',optimizer='sgd',metrics=['accuracy'])

    83. model.summary()

    84. return model

    然后需要读入数据集。这里我们选择的验证集大小是训练集的0.25。

     
    1. def get_train_val(val_rate = 0.25):

    2. train_url = []

    3. train_set = []

    4. val_set = []

    5. for pic in os.listdir(filepath + 'src'):

    6. train_url.append(pic)

    7. random.shuffle(train_url)

    8. total_num = len(train_url)

    9. val_num = int(val_rate * total_num)

    10. for i in range(len(train_url)):

    11. if i < val_num:

    12. val_set.append(train_url[i])

    13. else:

    14. train_set.append(train_url[i])

    15. return train_set,val_set

    16.  
    17. # data for training

    18. def generateData(batch_size,data=[]):

    19. #print 'generateData...'

    20. while True:

    21. train_data = []

    22. train_label = []

    23. batch = 0

    24. for i in (range(len(data))):

    25. url = data[i]

    26. batch += 1

    27. #print (filepath + 'src/' + url)

    28. #img = load_img(filepath + 'src/' + url, target_size=(img_w, img_h))

    29. img = load_img(filepath + 'src/' + url)

    30. img = img_to_array(img)

    31. # print img

    32. # print img.shape

    33. train_data.append(img)

    34. #label = load_img(filepath + 'label/' + url, target_size=(img_w, img_h),grayscale=True)

    35. label = load_img(filepath + 'label/' + url, grayscale=True)

    36. label = img_to_array(label).reshape((img_w * img_h,))

    37. # print label.shape

    38. train_label.append(label)

    39. if batch % batch_size==0:

    40. #print 'get enough bacth!\n'

    41. train_data = np.array(train_data)

    42. train_label = np.array(train_label).flatten()

    43. train_label = labelencoder.transform(train_label)

    44. train_label = to_categorical(train_label, num_classes=n_label)

    45. train_label = train_label.reshape((batch_size,img_w * img_h,n_label))

    46. yield (train_data,train_label)

    47. train_data = []

    48. train_label = []

    49. batch = 0

    50.  
    51. # data for validation

    52. def generateValidData(batch_size,data=[]):

    53. #print 'generateValidData...'

    54. while True:

    55. valid_data = []

    56. valid_label = []

    57. batch = 0

    58. for i in (range(len(data))):

    59. url = data[i]

    60. batch += 1

    61. #img = load_img(filepath + 'src/' + url, target_size=(img_w, img_h))

    62. img = load_img(filepath + 'src/' + url)

    63. #print img

    64. #print (filepath + 'src/' + url)

    65. img = img_to_array(img)

    66. # print img.shape

    67. valid_data.append(img)

    68. #label = load_img(filepath + 'label/' + url, target_size=(img_w, img_h),grayscale=True)

    69. label = load_img(filepath + 'label/' + url, grayscale=True)

    70. label = img_to_array(label).reshape((img_w * img_h,))

    71. # print label.shape

    72. valid_label.append(label)

    73. if batch % batch_size==0:

    74. valid_data = np.array(valid_data)

    75. valid_label = np.array(valid_label).flatten()

    76. valid_label = labelencoder.transform(valid_label)

    77. valid_label = to_categorical(valid_label, num_classes=n_label)

    78. valid_label = valid_label.reshape((batch_size,img_w * img_h,n_label))

    79. yield (valid_data,valid_label)

    80. valid_data = []

    81. valid_label = []

    82. batch = 0

    然后定义一下我们训练的过程,在这个任务上,我们把batch size定为16,epoch定为30,每次都存储最佳model(save_best_only=True),并且在训练结束时绘制loss/acc曲线,并存储起来。

     
    1. def train(args):

    2. EPOCHS = 30

    3. BS = 16

    4. model = SegNet()

    5. modelcheck = ModelCheckpoint(args['model'],monitor='val_acc',save_best_only=True,mode='max')

    6. callable = [modelcheck]

    7. train_set,val_set = get_train_val()

    8. train_numb = len(train_set)

    9. valid_numb = len(val_set)

    10. print ("the number of train data is",train_numb)

    11. print ("the number of val data is",valid_numb)

    12. H = model.fit_generator(generator=generateData(BS,train_set),steps_per_epoch=train_numb//BS,epochs=EPOCHS,verbose=1,

    13. validation_data=generateValidData(BS,val_set),validation_steps=valid_numb//BS,callbacks=callable,max_q_size=1)

    14.  
    15. # plot the training loss and accuracy

    16. plt.style.use("ggplot")

    17. plt.figure()

    18. N = EPOCHS

    19. plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")

    20. plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")

    21. plt.plot(np.arange(0, N), H.history["acc"], label="train_acc")

    22. plt.plot(np.arange(0, N), H.history["val_acc"], label="val_acc")

    23. plt.title("Training Loss and Accuracy on SegNet Satellite Seg")

    24. plt.xlabel("Epoch #")

    25. plt.ylabel("Loss/Accuracy")

    26. plt.legend(loc="lower left")

    27. plt.savefig(args["plot"])

    然后开始漫长的训练,训练时间接近3天,绘制出的loss/acc图如下:

    训练loss降到0.1左右,acc可以去到0.9,但是验证集的loss和acc都没那么好,貌似存在点问题。

    先不管了,先看看预测结果吧。

    这里需要思考一下怎么预测整张遥感图像。我们知道,我们训练模型时选择的图片输入是256×256,所以我们预测时也要采用256×256的图片尺寸送进模型预测。现在我们要考虑一个问题,我们该怎么将这些预测好的小图重新拼接成一个大图呢?这里给出一个最基础的方案:先给大图做padding 0操作,得到一副padding过的大图,同时我们也生成一个与该图一样大的全0图A,把图像的尺寸补齐为256的倍数,然后以256为步长切割大图,依次将小图送进模型预测,预测好的小图则放在A的相应位置上,依次进行,最终得到预测好的整张大图(即A),再做图像切割,切割成原先图片的尺寸,完成整个预测流程。

     
    1. def predict(args):

    2. # load the trained convolutional neural network

    3. print("[INFO] loading network...")

    4. model = load_model(args["model"])

    5. stride = args['stride']

    6. for n in range(len(TEST_SET)):

    7. path = TEST_SET[n]

    8. #load the image

    9. image = cv2.imread('./test/' + path)

    10. # pre-process the image for classification

    11. #image = image.astype("float") / 255.0

    12. #image = img_to_array(image)

    13. h,w,_ = image.shape

    14. padding_h = (h//stride + 1) * stride

    15. padding_w = (w//stride + 1) * stride

    16. padding_img = np.zeros((padding_h,padding_w,3),dtype=np.uint8)

    17. padding_img[0:h,0:w,:] = image[:,:,:]

    18. padding_img = padding_img.astype("float") / 255.0

    19. padding_img = img_to_array(padding_img)

    20. print 'src:',padding_img.shape

    21. mask_whole = np.zeros((padding_h,padding_w),dtype=np.uint8)

    22. for i in range(padding_h//stride):

    23. for j in range(padding_w//stride):

    24. crop = padding_img[:3,i*stride:i*stride+image_size,j*stride:j*stride+image_size]

    25. _,ch,cw = crop.shape

    26. if ch != 256 or cw != 256:

    27. print 'invalid size!'

    28. continue

    29.  
    30. crop = np.expand_dims(crop, axis=0)

    31. #print 'crop:',crop.shape

    32. pred = model.predict_classes(crop,verbose=2)

    33. pred = labelencoder.inverse_transform(pred[0])

    34. #print (np.unique(pred))

    35. pred = pred.reshape((256,256)).astype(np.uint8)

    36. #print 'pred:',pred.shape

    37. mask_whole[i*stride:i*stride+image_size,j*stride:j*stride+image_size] = pred[:,:]

    38.  
    39.  
    40. cv2.imwrite('./predict/pre'+str(n+1)+'.png',mask_whole[0:h,0:w])

    预测的效果图如下:

    一眼看去,效果真的不错,但是仔细看一下,就会发现有个很大的问题:拼接痕迹过于明显了!那怎么解决这类边缘问题呢?很直接的想法就是缩小切割时的滑动步伐,比如我们把切割步伐改为128,那么拼接时就会有一般的图像发生重叠,这样做可以尽可能地减少拼接痕迹。

    U-Net

    对于这个语义分割任务,我们毫不犹豫地选择了U-Net作为我们的方案,原因很简单,我们参考很多类似的遥感图像分割比赛的资料,绝大多数获奖的选手使用的都是U-Net模型。在这么多的好评下,我们选择U-Net也就毫无疑问了。

    U-Net有很多优点,最大卖点就是它可以在小数据集上也能train出一个好的模型,这个优点对于我们这个任务来说真的非常适合。而且,U-Net在训练速度上也是非常快的,这对于需要短时间就得出结果的期末project来说也是非常合适。U-Net在网络架构上还是非常优雅的,整个呈现U形,故起名U-Net。这里不打算详细介绍U-Net结构,有兴趣的深究的可以看看论文。

    现在开始谈谈代码细节。首先我们定义一下U-Net的网络结构,这里用的deep learning框架还是Keras。

    注意到,我们这里训练的模型是一个多分类模型,其实更好的做法是,训练一个二分类模型(使用二分类的标签),对每一类物体进行预测,得到4张预测图,再做预测图叠加,合并成一张完整的包含4类的预测图,这个策略在效果上肯定好于一个直接4分类的模型。所以,U-Net这边我们采取的思路就是对于每一类的分类都训练一个二分类模型,最后再将每一类的预测结果组合成一个四分类的结果。

    定义U-Net结构,注意了,这里的loss function我们选了binary_crossentropy,因为我们要训练的是二分类模型。

     
    1. def unet():

    2. inputs = Input((3, img_w, img_h))

    3.  
    4. conv1 = Conv2D(32, (3, 3), activation="relu", padding="same")(inputs)

    5. conv1 = Conv2D(32, (3, 3), activation="relu", padding="same")(conv1)

    6. pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)

    7.  
    8. conv2 = Conv2D(64, (3, 3), activation="relu", padding="same")(pool1)

    9. conv2 = Conv2D(64, (3, 3), activation="relu", padding="same")(conv2)

    10. pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)

    11.  
    12. conv3 = Conv2D(128, (3, 3), activation="relu", padding="same")(pool2)

    13. conv3 = Conv2D(128, (3, 3), activation="relu", padding="same")(conv3)

    14. pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)

    15.  
    16. conv4 = Conv2D(256, (3, 3), activation="relu", padding="same")(pool3)

    17. conv4 = Conv2D(256, (3, 3), activation="relu", padding="same")(conv4)

    18. pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)

    19.  
    20. conv5 = Conv2D(512, (3, 3), activation="relu", padding="same")(pool4)

    21. conv5 = Conv2D(512, (3, 3), activation="relu", padding="same")(conv5)

    22.  
    23. up6 = concatenate([UpSampling2D(size=(2, 2))(conv5), conv4], axis=1)

    24. conv6 = Conv2D(256, (3, 3), activation="relu", padding="same")(up6)

    25. conv6 = Conv2D(256, (3, 3), activation="relu", padding="same")(conv6)

    26.  
    27. up7 = concatenate([UpSampling2D(size=(2, 2))(conv6), conv3], axis=1)

    28. conv7 = Conv2D(128, (3, 3), activation="relu", padding="same")(up7)

    29. conv7 = Conv2D(128, (3, 3), activation="relu", padding="same")(conv7)

    30.  
    31. up8 = concatenate([UpSampling2D(size=(2, 2))(conv7), conv2], axis=1)

    32. conv8 = Conv2D(64, (3, 3), activation="relu", padding="same")(up8)

    33. conv8 = Conv2D(64, (3, 3), activation="relu", padding="same")(conv8)

    34.  
    35. up9 = concatenate([UpSampling2D(size=(2, 2))(conv8), conv1], axis=1)

    36. conv9 = Conv2D(32, (3, 3), activation="relu", padding="same")(up9)

    37. conv9 = Conv2D(32, (3, 3), activation="relu", padding="same")(conv9)

    38.  
    39. conv10 = Conv2D(n_label, (1, 1), activation="sigmoid")(conv9)

    40. #conv10 = Conv2D(n_label, (1, 1), activation="softmax")(conv9)

    41.  
    42. model = Model(inputs=inputs, outputs=conv10)

    43. model.compile(optimizer='Adam', loss='binary_crossentropy', metrics=['accuracy'])

    44. return model

    读取数据的组织方式有一些改动。

     
    1. # data for training

    2. def generateData(batch_size,data=[]):

    3. #print 'generateData...'

    4. while True:

    5. train_data = []

    6. train_label = []

    7. batch = 0

    8. for i in (range(len(data))):

    9. url = data[i]

    10. batch += 1

    11. img = load_img(filepath + 'src/' + url)

    12. img = img_to_array(img)

    13. train_data.append(img)

    14. label = load_img(filepath + 'label/' + url, grayscale=True)

    15. label = img_to_array(label)

    16. #print label.shape

    17. train_label.append(label)

    18. if batch % batch_size==0:

    19. #print 'get enough bacth!\n'

    20. train_data = np.array(train_data)

    21. train_label = np.array(train_label)

    22.  
    23. yield (train_data,train_label)

    24. train_data = []

    25. train_label = []

    26. batch = 0

    27.  
    28. # data for validation

    29. def generateValidData(batch_size,data=[]):

    30. #print 'generateValidData...'

    31. while True:

    32. valid_data = []

    33. valid_label = []

    34. batch = 0

    35. for i in (range(len(data))):

    36. url = data[i]

    37. batch += 1

    38. img = load_img(filepath + 'src/' + url)

    39. #print img

    40. img = img_to_array(img)

    41. # print img.shape

    42. valid_data.append(img)

    43. label = load_img(filepath + 'label/' + url, grayscale=True)

    44. valid_label.append(label)

    45. if batch % batch_size==0:

    46. valid_data = np.array(valid_data)

    47. valid_label = np.array(valid_label)

    48. yield (valid_data,valid_label)

    49. valid_data = []

    50. valid_label = []

    51. batch = 0

    训练:指定输出model名字和训练集位置

    python unet.py --model unet_buildings20.h5 --data ./unet_train/buildings/

    预测单张遥感图像时我们分别使用4个模型做预测,那我们就会得到4张mask(比如下图就是我们用训练好的buildings模型预测的结果),我们现在要将这4张mask合并成1张,那么怎么合并会比较好呢?我思路是,通过观察每一类的预测结果,我们可以从直观上知道哪些类的预测比较准确,那么我们就可以给这些mask图排优先级了,比如:priority:building>water>road>vegetation,那么当遇到一个像素点,4个mask图都说是属于自己类别的标签时,我们就可以根据先前定义好的优先级,把该像素的标签定为优先级最高的标签。代码思路可以参照下面的代码:

     
    1. def combind_all_mask():

    2. for mask_num in tqdm(range(3)):

    3. if mask_num == 0:

    4. final_mask = np.zeros((5142,5664),np.uint8)#生成一个全黑全0图像,图片尺寸与原图相同

    5. elif mask_num == 1:

    6. final_mask = np.zeros((2470,4011),np.uint8)

    7. elif mask_num == 2:

    8. final_mask = np.zeros((6116,3356),np.uint8)

    9. #final_mask = cv2.imread('final_1_8bits_predict.png',0)

    10.  
    11. if mask_num == 0:

    12. mask_pool = mask1_pool

    13. elif mask_num == 1:

    14. mask_pool = mask2_pool

    15. elif mask_num == 2:

    16. mask_pool = mask3_pool

    17. final_name = img_sets[mask_num]

    18. for idx,name in enumerate(mask_pool):

    19. img = cv2.imread('./predict_mask/'+name,0)

    20. height,width = img.shape

    21. label_value = idx+1 #coressponding labels value

    22. for i in tqdm(range(height)): #priority:building>water>road>vegetation

    23. for j in range(width):

    24. if img[i,j] == 255:

    25. if label_value == 2:

    26. final_mask[i,j] = label_value

    27. elif label_value == 3 and final_mask[i,j] != 2:

    28. final_mask[i,j] = label_value

    29. elif label_value == 4 and final_mask[i,j] != 2 and final_mask[i,j] != 3:

    30. final_mask[i,j] = label_value

    31. elif label_value == 1 and final_mask[i,j] == 0:

    32. final_mask[i,j] = label_value

    33.  
    34. cv2.imwrite('./final_result/'+final_name,final_mask)

    35.  
    36.  
    37. print 'combinding mask...'

    38. combind_all_mask()

    模型融合

    集成学习的方法在这类比赛中经常使用,要想获得好成绩集成学习必须做得好。在这里简单谈谈思路,我们使用了两个模型,我们模型也会采取不同参数去训练和预测,那么我们就会得到很多预测MASK图,此时 我们可以采取模型融合的思路,对每张结果图的每个像素点采取投票表决的思路,对每张图相应位置的像素点的类别进行预测,票数最多的类别即为该像素点的类别。正所谓“三个臭皮匠,胜过诸葛亮”,我们这种ensemble的思路,可以很好地去掉一些明显分类错误的像素点,很大程度上改善模型的预测能力。

    少数服从多数的投票表决策略代码:

     
    1. import numpy as np

    2. import cv2

    3. import argparse

    4.  
    5. RESULT_PREFIXX = ['./result1/','./result2/','./result3/']

    6.  
    7. # each mask has 5 classes: 0~4

    8.  
    9. def vote_per_image(image_id):

    10. result_list = []

    11. for j in range(len(RESULT_PREFIXX)):

    12. im = cv2.imread(RESULT_PREFIXX[j]+str(image_id)+'.png',0)

    13. result_list.append(im)

    14.  
    15. # each pixel

    16. height,width = result_list[0].shape

    17. vote_mask = np.zeros((height,width))

    18. for h in range(height):

    19. for w in range(width):

    20. record = np.zeros((1,5))

    21. for n in range(len(result_list)):

    22. mask = result_list[n]

    23. pixel = mask[h,w]

    24. #print('pix:',pixel)

    25. record[0,pixel]+=1

    26.  
    27. label = record.argmax()

    28. #print(label)

    29. vote_mask[h,w] = label

    30.  
    31. cv2.imwrite('vote_mask'+str(image_id)+'.png',vote_mask)

    32.  
    33.  
    34. vote_per_image(3)

    模型融合后的预测结果:

    可以看出,模型融合后的预测效果确实有较大提升,明显错误分类的像素点消失了。

    额外的思路:GAN

    我们对数据方面思考得更多一些,我们针对数据集小的问题,我们有个想法:使用生成对抗网络去生成虚假的卫星地图,旨在进一步扩大数据集。我们的想法就是,使用这些虚假+真实的数据集去训练网络,网络的泛化能力肯定有更大的提升。我们的想法是根据这篇论文(pix2pix)来展开的,这是一篇很有意思的论文,它主要讲的是用图像生成图像的方法。里面提到了用标注好的卫星地图生成虚假的卫星地图的想法,真的让人耳目一新,我们也想根据该思路,生成属于我们的虚假卫星地图数据集。 Map to Aerial的效果是多么的震撼。

    但是我们自己实现起来的效果却不容乐观(如下图所示,右面那幅就是我们生成的假图),效果不好的原因有很多,标注的问题最大,因为生成的虚假卫星地图质量不好,所以该想法以失败告终,生成的假图也没有拿去做训练。但感觉思路还是可行的,如果给的标注合适的话,还是可以生成非常像的虚假地图。

    总结

    对于这类遥感图像的语义分割,思路还有很多,最容易想到的思路就是,将各种语义分割经典网络都实现以下,看看哪个效果最好,再做模型融合,只要集成学习做得好,效果一般都会很不错的。我们仅靠上面那个简单思路(数据增强,经典模型搭建,集成学习),就已经可以获得比赛的TOP 5%了,当然还有一些tricks可以使效果更进一步提升,这里就不细说了,总的建模思路掌握就行。完整的代码可以在我的github获取。

     

    数据下载:

    链接:https://pan.baidu.com/s/1i6oMukH

    密码:yqj2

    展开全文
  • 多模态融合的高分遥感图像语义分割方法(python) 论文地址:http://www.cnki.com.cn/Article/CJFDTotal-ZNZK202004012.htm 1、SE-UNet 网络模型 2、SE-UNet的具体设计方案 3、SE-UNet的pytorch复现 import torch.nn...

    多模态融合的高分遥感图像语义分割方法(python)

    论文地址:http://www.cnki.com.cn/Article/CJFDTotal-ZNZK202004012.htm

    1、SE-UNet 网络模型

    在这里插入图片描述

    2、SE-UNet的具体设计方案

    在这里插入图片描述

    3、SE-UNet的pytorch复现

    import torch.nn as nn
    import torch.utils.model_zoo as model_zoo
    from torch.nn import functional as F
    import torch
    
    class SEBlock(nn.Module):
    
        def __init__(self,ch_in):
            super(SEBlock, self).__init__()
            self.relu = nn.ReLU(inplace=False)
            self.global_pool = nn.AdaptiveAvgPool2d((1, 1))  # N * 32 * 1 * 1
            self.fc1 = nn.Linear(in_features = int(ch_in), out_features = int(ch_in//2))
            self.fc2 = nn.Linear(in_features = int(ch_in//2), out_features = int(ch_in))
            self.sigmoid = nn.Sigmoid()
    
        def forward(self, x):
            # sequeeze
            out = self.global_pool(x)   
            out = out.view(out.size(0), -1)
            # Excitation
            out = self.fc1(out)
            out = self.relu(out)
            out = self.fc2(out)
            out = self.sigmoid(out)
            out = out.view(out.size(0), out.size(1), 1, 1)
            # Scale
            # out = out * x
            # out += x
            # out = self.relu(out)
    
            return out
            
    class DoubleConv(nn.Module):
        def __init__(self, in_ch, out_ch):
            super(DoubleConv, self).__init__()
            self.conv = nn.Sequential(
                nn.Conv2d(in_ch, out_ch, 3, padding=1),
                nn.BatchNorm2d(out_ch), #添加了BN层
                nn.ReLU(inplace=True),
                nn.Conv2d(out_ch, out_ch, 3, padding=1),
                nn.BatchNorm2d(out_ch),
                nn.ReLU(inplace=True)
            )
    
        def forward(self, input):
            return self.conv(input)
    
    class Unet(nn.Module):
        def __init__(self, in_ch, out_ch):
            super(Unet, self).__init__()
            self.pool = nn.MaxPool2d(2)
            self.conv1 = DoubleConv(in_ch, 64)
            self.pool1 = nn.MaxPool2d(2)
            self.conv2 = DoubleConv(64, 128)
            self.pool2 = nn.MaxPool2d(2)
            self.conv3 = DoubleConv(128, 256)
            self.pool3 = nn.MaxPool2d(2)
            self.conv4 = DoubleConv(256, 512)
            self.pool4 = nn.MaxPool2d(2)
            self.conv5 = DoubleConv(512, 1024)
            # 逆卷积,也可以使用上采样(保证k=stride,stride即上采样倍数)
            self.up6 = nn.ConvTranspose2d(1024, 512, 2, stride=2)
            self.conv6 = DoubleConv(1024, 512)
            self.up7 = nn.ConvTranspose2d(512, 256, 2, stride=2)
            self.conv7 = DoubleConv(512, 256)
            self.up8 = nn.ConvTranspose2d(256, 128, 2, stride=2)
            self.conv8 = DoubleConv(256, 128)
            self.up9 = nn.ConvTranspose2d(128, 64, 2, stride=2)
            self.conv9 = DoubleConv(128, 64)
            self.conv10 = nn.Conv2d(64, out_ch, 1)
            self.conv1_dilation = nn.Conv2d(2048, 256, 1, stride=1, padding=0, bias=False, dilation=1)  # dilation就是空洞率,即间隔
            self.conv2_dilation = nn.Conv2d(2048, 256, 2, stride=1, padding=2, bias=False, dilation=2)  # dilation就是空洞率,即间隔
            self.conv4_dilation = nn.Conv2d(2048, 256, 4, stride=1, padding=4, bias=False, dilation=4)  # dilation就是空洞率,即间隔
            self.global_pool = nn.AdaptiveAvgPool2d((1, 1)) 
            self.upsample = nn.Upsample(scale_factor=7, mode='bicubic', align_corners=True) 
            self.conv_c = nn.Conv2d(2816, 1024, 1, stride=1, padding=0, bias=False, dilation=1)  # dilation就是空洞率,即间隔
            self.upsample1 = nn.Upsample(scale_factor=2, mode='bicubic', align_corners=True) 
    
            self.R1 = nn.Sequential(
                nn.Conv2d(1, 64, 3, 1, 1, bias=False),
                nn.BatchNorm2d(64),
                nn.ReLU(inplace=False)
            ) # N * 16 * 16 * 16
    
            self.RP2 = nn.Sequential(
                nn.Conv2d(64, 128, 3, 1, 1, bias=False),
                nn.BatchNorm2d(128),
                nn.ReLU(inplace=False),
                nn.MaxPool2d(2, 2),
                nn.ReLU(inplace=False)
            ) # N * 16 * 16 * 16
    
            self.RP3 = nn.Sequential(
                nn.Conv2d(128, 256, 3, 1, 1, bias=False),
                nn.BatchNorm2d(256),
                nn.ReLU(inplace=False),
                nn.MaxPool2d(2, 2),
                nn.ReLU(inplace=False)
            )
    
            self.RP4 = nn.Sequential(
                nn.Conv2d(256, 512, 3, 1, 1, bias=False),
                nn.BatchNorm2d(512),
                nn.ReLU(inplace=False),
                nn.MaxPool2d(2, 2),
                nn.ReLU(inplace=False)
            )
            self.RP5 = nn.Sequential(
                nn.Conv2d(512, 1024, 3, 1, 1, bias=False),
                nn.BatchNorm2d(1024),
                nn.ReLU(inplace=False),
                nn.MaxPool2d(2, 2),
                nn.ReLU(inplace=False)
            )
            self.SE1 = SEBlock(64)
            self.SE2 = SEBlock(128)
            self.SE3 = SEBlock(256)
            self.SE4 = SEBlock(512)
            self.SE5 = SEBlock(1024)
    
        def forward(self, DSM, RGB):
            c1_DSM = self.R1(DSM)        # [2, 64, 512, 512]
            c1_SE_DSM = self.SE1(c1_DSM) # [2, 64,  1,  1]
            c1_RGB = self.conv1(RGB)     # [2, 64, 512, 512]
            c1_RGB = c1_SE_DSM * c1_RGB  # [2, 64, 512, 512]
            p1_RGB = self.pool1(c1_RGB)  # [2, 64, 256, 256]
    
            c2_DSM = self.RP2(c1_DSM)    # [2, 128, 256, 256]
            c2_SE_DSM = self.SE2(c2_DSM) # [2, 128,  1,  1]
            c2_RGB = self.conv2(p1_RGB)  # [2, 128, 256, 256]
            c2_RGB = c2_SE_DSM * c2_RGB  # [2, 128, 256, 256]
            p2_RGB = self.pool2(c2_RGB)  # [2, 128, 128, 128]
    
            c3_DSM = self.RP3(c2_DSM)    # [2, 256, 128, 128]
            c3_SE_DSM = self.SE3(c3_DSM) # [2, 256,  1,  1]
            c3_RGB = self.conv3(p2_RGB)  # [2, 256, 128, 128]
            c3_RGB = c3_SE_DSM * c3_RGB  # [2, 256, 128, 128]
            p3_RGB = self.pool3(c3_RGB)  # [2, 256, 64, 64]
    
            c4_DSM = self.RP4(c3_DSM)    # [2, 512, 64, 64]
            c4_SE_DSM = self.SE4(c4_DSM) # [2, 512,  1,  1]
            c4_RGB = self.conv4(p3_RGB)  # [2, 512, 64, 64]
            c4_RGB = c4_SE_DSM * c4_RGB  # [2, 512, 64, 64]
            p4_RGB = self.pool4(c4_RGB)  # [2, 512, 32, 32]
    
            c5_DSM = self.RP5(c4_DSM)    # [2, 1024, 32, 32]
            c5_SE_DSM = self.SE5(c5_DSM) # [2, 1024,  1,  1]
            c5_RGB = self.conv5(p4_RGB)  # [2, 1024, 32, 32]
            c5_RGB = c5_SE_DSM * c5_RGB  # [2, 1024, 32, 32]
             
            up_6 = self.up6(c5_RGB) # [2, 512, 64, 64]
            merge6 = torch.cat([up_6, c4_RGB], dim=1) # [2, 1024, 64, 64]
            c6 = self.conv6(merge6) # [2, 512, 64, 64]
            up_7 = self.up7(c6)     # [2, 256, 128, 128]
    
            merge7 = torch.cat([up_7, c3_RGB], dim=1) # [2, 512, 128, 128]
            c7 = self.conv7(merge7) # [2, 256, 128, 128]
            up_8 = self.up8(c7)     # [2, 128, 256, 256]
    
            merge8 = torch.cat([up_8, c2_RGB], dim=1) # [2, 256, 256, 256]
            c8 = self.conv8(merge8) # [2, 128, 256, 256]
            up_9 = self.up9(c8)     # [2, 64, 512, 512]
    
            merge9 = torch.cat([up_9, c1_RGB], dim=1) # [2, 128, 512, 512]
            c9 = self.conv9(merge9) # [2, 64, 512, 512]
            c10 = self.conv10(c9)   # [2, 3, 512, 512]
            out = nn.Sigmoid()(c10) # [2, 3, 512, 512]
            return out
    
    if __name__ == "__main__":
        DSM = torch.randn(2, 1, 512, 512)
        RGB = torch.randn(2, 3, 512, 512)
        UNet = Unet(3,3)
        out_result = UNet(DSM,RGB)
        print(out_result)
        print(out_result.shape)
    
    展开全文
  • 遥感图像语义分割——从原始图像开始制作自己的数据集(以高分二号为例) 文章目录遥感图像语义分割——从原始图像开始制作自己的数据集(以高分二号为例)1.遥感影像获取2.遥感数据预处理(影像融合)3.遥感影像批量...
  • 遥感图像包含多个波段(往往大于4个),Opencv或PIL就不太顶用了,这时候GDAL就派上用场了 例如我有一个十波段图像,用此函数读取后为numpy数组类,shape为[h,w,10] from osgeo import gdal import numpy as np def ...
  • 0.9765726814389497 后记 有问题欢迎留言评论,觉得不错可以动动手指点个赞同&喜欢 参考https://github.com/jfzhang95/pytorch-deeplab-xception/blob/master/utils/metrics.py​github.com【语义分割】评价指标...
  • 遥感图像语义分割各公开数据集

    千次阅读 2020-12-03 20:41:54
    由于遥感图像具有海量数据,尺度依赖,空间相关性强的特点,能够很好地用语义分割的方法来提取地物或进行分类。 随着全卷积神经网络的提出,卷积网络不仅在全图式的分类上有所提高,也在结构化输出的局部任务上取得...
  • **基于SegNet和U-Net的遥感图像语义分割(一)** 代码: https://github.com/AstarLight/Satellite-Segmentation 数据集 链接:https://pan.baidu.com/s/1ajdS8ZRKY4ihrn0sf-2q2w 提取码:i4wr 解压后得到3个文件...
  • 该存储库包含以下论文的项目代码:“基于沙漏形网络的高分辨率航空影像语义分割” ( ) 由于该项目仍在进行中,因此预处理,WBP后处理和评估的代码目前正在代码审查中,并将在不久的将来逐步发布。 在此存储库中...
  • 深度学习遥感图像语义分割&目标检测 代码见github: WangZhenqing-RS/2021Tianchi_RSgithub.com 图标 赛题描述 本赛题基于不同地形地貌的高分辨率遥感影像资料,希望参赛者能够利用遥感影像智能解译技术识别提取...
  • 针对高分辨遥感图像分割问题,提出一种基于U-Net改进的深度卷积神经网络,实现了端到端的像素级语义分割。对原始数据集做了扩充,对每一类地物目标训练一个二分类模型,随后将各预测子图组合生成最终语义分割图像。...
  • 该链接介绍了hrnet改进 1. pose_hrnet 原样复制的HRNet,不同之处是相比于原作者写的代码,每类情况单独写一个函数,没有在一个函数中为了考虑各种情况写的比较复杂难懂。 总的来说,HRNet还是存在像inception...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 1,442
精华内容 576
关键字:

遥感图像语义分割