精华内容
下载资源
问答
  • 显著目标检测
    千次阅读
    2021-10-10 13:25:42

    (CVPR’17) Learning to Detect Salient Objects with Image-level Supervision

    Deep Neural Networks (DNNs) have substantially improved the state-of-the-art in salient object detection. However, training DNNs requires costly pixel-level annotations. In this paper, we leverage the observation that image level tags provide important cues of foreground salient objects, and develop a weakly supervised learning method for saliency detection using image-level tags only. The Foreground Inference Network (FIN) is introduced for this challenging task. In the first stage of our training method, FIN is jointly trained with a fully convolutional network (FCN) for image-level tag prediction. A global smooth pooling layer is proposed, enabling FCN to assign object category tags to corresponding object regions, while FIN is capable of capturing all potential foreground regions with the predicted saliency maps. In the second stage, FIN is fine-tuned with its predicted saliency maps as ground truth. For refinement of ground truth, an iterative Conditional Random Field is developed to enforce spatial label consistency and further boost performance. Our method alleviates annotation efforts and allows the usage of existing large scale training sets with image-level tags. Our model runs at 60 FPS, outperforms unsupervised ones with a large margin, and achieves comparable or even superior performance than fully supervised counterparts.

    深度神经网络(DNN)已经大大改善了显著目标检测的SOTA。然而,训练DNN需要昂贵的像素级标注。在本文中,我们利用图像级标签能够提供前景显著目标的重要线索这一观察结果,开发了一种仅使用图像级标签进行显著性检测的弱监督学习方法。前景推理网络(FIN)被引入到这项具有挑战性的任务中。在我们训练方法的第一阶段,FIN与全卷积网络(FCN)联合训练,用于图像级标签的预测。我们提出了一个全局平滑池化层,使FCN能够将物体类别标签分配给相应的物体区域,而FIN能够用预测的显著图捕获所有潜在的前景区域。在第二阶段,FIN以其预测的显著图作为真值进行微调。为了细化真值,我们开发了一个迭代的条件随机场,以加强空间标签的一致性并进一步提高性能。我们的方法减轻了标注工作,并允许使用现有的具有图像级标签的大规模训练集。我们的模型以60 FPS的速度运行,以很大的幅度超越了无监督的模型,并取得了与完全监督的模型相当甚至更高的性能。


    (ICCV’17) Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector

    In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is “supervision by fusion”, i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.

    鉴于深度神经网络(DNNs)强大的学习能力,近年来已经建立了深度(卷积)模型来解决显著目标检测的任务。尽管训练这样的深度显著性模型可以显著提高检测性能,但它需要大规模的人工监督,其形式是像素级的人工标注,这是高度劳动密集和耗时的。为了解决这个问题,本文在不使用任何人工标注的情况下训练一个深度显著性目标检测器。关键的见解是"融合监督",即从弱小但快速的无监督的显著性模型的融合过程中产生有用的监督信号。基于这一见解,我们在提出的框架中结合了图像内融合流和图像间融合流,以产生学习课程和伪真值,用于监督深度目标检测器的训练。在四个基准数据集上的综合实验表明,我们的方法可以接近用完全监督训练的相同网络(在2-5%的性能差距内),更令人鼓舞的是,甚至超过了一些完全监督的最先进方法。


    (AAAI’18) Weakly Supervised Salient Object Detection Using Image Labels

    Deep learning based salient object detection has recently achieved great success with its performance greatly outperforms any other unsupervised methods. However, annotating per-pixel saliency masks is a tedious and inefficient procedure. In this paper, we note that superior salient object detection can be obtained by iteratively mining and correcting the labeling ambiguity on saliency maps from traditional unsupervised methods. We propose to use the combination of a coarse salient object activation map from the classification network and saliency maps generated from unsupervised methods as pixel-level annotation, and develop a simple yet very effective algorithm to train fully convolutional networks for salient object detection supervised by these noisy annotations. Our algorithm is based on alternately exploiting a graphical model and training a fully convolutional network for model updating. The graphical model corrects the internal labeling ambiguity through spatial consistency and structure preserving while the fully convolutional network helps to correct the cross-image semantic ambiguity and simultaneously update the coarse activation map for next iteration. Experimental results demonstrate that our proposed method greatly outperforms all state-of-the-art unsupervised saliency detection methods and can be comparable to the current best strongly-supervised methods training with thousands of pixel-level saliency map annotations on all public benchmarks.

    基于深度学习的显著目标检测最近取得了巨大的成功,其性能大大超过了任何其他无监督的方法。然而,标注每个像素的显著性mask是一个繁琐而低效的过程。在本文中,我们注意到,通过迭代挖掘和纠正传统无监督方法中对显著图的标注不确定性,可以实现更好的显著性目标检测。我们提出使用来自分类网络的粗略的显著物体激活图和来自无监督方法的显著图的组合作为像素级的标注,并开发出一种简单但非常有效的算法来训练全卷积网络,用于由这些噪声标注监督的显著目标检测。我们的算法是基于交替利用图模型和训练全卷积网络来更新模型。图模型通过空间一致性和结构保留来纠正内部标签的模糊性,而全卷积网络则有助于纠正跨图像语义的模糊性,并同时为下一次迭代更新粗糙的激活图。实验结果表明,我们提出的方法大大超过了所有最先进的无监督的显著性检测方法,并且可以与目前最好的在所有公共基准上用数千个像素级显著图标注进行训练的强监督方法相媲美。


    (CVPR’18) Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective

    The success of current deep saliency detection methods heavily depends on the availability of large-scale supervision in the form of per-pixel labeling. Such supervision, while labor-intensive and not always possible, tends to hinder the generalization ability of the learned models. By contrast, traditional handcrafted features based unsupervised saliency detection methods, even though have been surpassed by the deep supervised methods, are generally dataset-independent and could be applied in the wild. This raises a natural question that “Is it possible to learn saliency maps without using labeled data while improving the generalization ability?”. To this end, we present a novel perspective to unsupervised saliency detection through learning from multiple noisy labeling generated by “weak” and “noisy” unsupervised handcrafted saliency methods. Our end-to-end deep learning framework for unsupervised saliency detection consists of a latent saliency prediction module and a noise modeling module that work collaboratively and are optimized jointly. Explicit noise modeling enables us to deal with noisy saliency maps in a probabilistic way. Extensive experimental results on various benchmarking datasets show that our model not only outperforms all the unsupervised saliency methods with a large margin but also achieves comparable performance with the recent state-of-the-art supervised deep saliency methods.

    目前的深度显著性检测方法的成功在很大程度上取决于是否有大规模的监督,即每个像素的标签形式。这样的监督虽然耗费人力,但并不总是可能的,往往会阻碍所学模型的泛化能力。相比之下,传统的基于手工特征的无监督的突出性检测方法,尽管已经被深度监督方法所超越,但通常是独立于数据集的,可以在任意场景应用。这就提出了一个自然的问题:“是否有可能在不使用标记数据的情况下学习显著性地图,同时提高泛化能力?”。为此,我们提出了一个新的观点,即通过从 "弱 "和 "有噪声 "的无监督手工制作的显著性方法产生的多个嘈杂标签中学习无监督显著性检测。我们用于无监督的显著性检测的端到端深度学习框架包括一个潜在的显著性预测模块和一个噪声建模模块,它们协同工作并共同优化。明确的噪声建模使我们能够以概率的方式来处理有噪声的显著图。在各种基准数据集上的大量实验结果表明,我们的模型不仅以很大的幅度超过了所有的无监督的显著性方法,而且还取得了与最近最先进的有监督的深度显著性方法相当的性能。


    (CVPR’19) Multi-source weak supervision for saliency detection

    The high cost of pixel-level annotations makes it appealing to train saliency detection models with weak supervision. However, a single weak supervision source usually does not contain enough information to train a well-performing model. To this end, we propose a unified framework to train saliency detection models with diverse weak supervision sources. In this paper, we use category labels, captions, and unlabelled data for training, yet other supervision sources can also be plugged into this flexible framework. We design a classification network (CNet) and a caption generation network (PNet), which learn to predict object categories and generate captions, respectively, meanwhile highlight the most important regions for corresponding tasks. An attention transfer loss is designed to transmit supervision signal between networks, such that the network designed to be trained with one supervision source can benefit from another. An attention coherence loss is defined on unlabelled data to encourage the networks to detect generally salient regions instead of task-specific regions. We use CNet and PNet to generate pixel-level pseudo labels to train a saliency prediction network (SNet). During the testing phases, we only need SNet to predict saliency maps. Experiments demonstrate the performance of our method compares favourably against unsupervised and weakly supervised methods and even some supervised methods.

    像素级注释的高成本使得用弱监督来训练显著性检测模型很有吸引力。然而,单一的弱监督源通常并不包含足够的信息来训练一个表现良好的模型。为此,我们提出了一个统一的框架,用不同的弱监督源来训练显著性检测模型。在本文中,我们使用类别标签、标题和未标注的数据进行训练,并且其他监督源也可以插入到这个灵活的框架中。我们设计了一个分类网络(CNet)和一个标题生成网络(PNet),它们分别学习预测物体类别和生成标题,同时突出相应任务的最重要区域。注意力转移损失的设计是为了在网络之间传输监督信号,这样,用一个监督源训练的网络可以从另一个监督源中受益。注意力一致性损失被定义在未标记的数据上,以鼓励网络检测一般的显著区域而不是特定的任务区域。我们使用CNet和PNet来生成像素级的伪标签来训练一个显著性预测网络(SNet)。在测试阶段,我们只需要SNet来预测显著图。实验表明,我们的方法与无监督和弱监督的方法,甚至一些有监督的方法相比,性能都很好。


    (CVPR’20) Weakly-Supervised Salient Object Detection via Scribble Annotations

    Compared with laborious pixel-wise dense labeling, it is much easier to label data by scribbles, which only costs 1∼2 seconds to label one image. However, using scribble labels to learn salient object detection has not been explored. In this paper, we propose a weakly-supervised salient object detection model to learn saliency from such annotations. In doing so, we first relabel an existing large-scale salient object detection dataset with scribbles, namely S-DUTS dataset. Since object structure and detail information is not identified by scribbles, directly training with scribble labels will lead to saliency maps of poor boundary localization. To mitigate this problem, we propose an auxiliary edge detection task to localize object edges explicitly, and a gated structure-aware loss to place constraints on the scope of structure to be recovered. Moreover, we design a scribble boosting scheme to iteratively consolidate our scribble annotations, which are then employed as supervision to learn high-quality saliency maps. As existing saliency evaluation metrics neglect to measure structure alignment of the predictions, the saliency map ranking metric may not comply with human perception. We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps, which is more consistent with human perception. Extensive experiments on six benchmark datasets demonstrate that our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.

    与费力的像素级密集标注相比,用涂鸦标注数据要容易得多,标注一张图片只需花费1∼2秒。然而,使用涂鸦标签来学习显著目标检测还没有被探索过。在本文中,我们提出了一个弱监督的显著目标检测模型,从这种标注中学习显著性。在此过程中,我们首先重新用涂鸦标注了现有的大规模显著目标检测数据集,记为S-DUTS。由于物体的结构和细节信息不被涂鸦所识别,直接用涂鸦标签进行训练会导致显著图的边界定位不佳。为了缓解这个问题,我们提出了一个辅助的边缘检测任务来明确定位物体的边缘,以及一个门控的结构感知损失来对要恢复的结构范围进行约束。此外,我们设计了一个涂鸦增强方案,以反复巩固我们的涂鸦标注,然后将其作为监督来学习高质量的显著图。由于现有的显著性评价指标忽视了对预测结构的度量,因此显著图的评价指标可能不符合人类的感知。我们提出了一个新的指标,称为显著性结构度量,来测量预测的显著图的结构排列,这更符合人类的感知。在六个基准数据集上进行的大量实验表明,我们的方法不仅优于现有的弱监督/无监督方法,而且与几个完全监督的最先进的模型相当。


    (AAAI’21) Structure-Consistent Weakly Supervised Salient Object Detection with Local Saliency Coherence

    Sparse labels have been attracting much attention in recent years. However, the performance gap between weakly supervised and fully supervised salient object detection methods is huge, and most previous weakly supervised works adopt complex training methods with many bells and whistles. In this work, we propose a one-round end-to-end training approach for weakly supervised salient object detection via scribble annotations without pre/post-processing operations or extra supervision data. Since scribble labels fail to offer detailed salient regions, we propose a local coherence loss to propagate the labels to unlabeled regions based on image features and pixel distance, so as to predict integral salient regions with complete object structures. We design a saliency structure consistency loss as self-consistent mechanism to ensure consistent saliency maps are predicted with different scales of the same image as input, which could be viewed as a regularization technique to enhance the model generalization ability. Additionally, we design an aggregation module (AGGM) to better integrate high-level features, low-level features and global context information for the decoder to aggregate various information. Extensive experiments show that our method achieves a new state-of-the-art performance on six benchmarks (e.g. for the ECSSD dataset: F_\beta = 0.8995, E_\xi = 0.9079 and MAE = 0.0489$), with an average gain of 4.60% for F-measure, 2.05% for E-measure and 1.88% for MAE over the previous best method on this task.

    近几年来,稀疏标签一直备受关注。然而,弱监督与完全监督的SOD方法之间的性能差距是巨大的,并且以前的大多数弱监督方法都采用了复杂的训练过程与花哨的设计技巧。在本文中,我们提出了一个通过草图标注(scribble annotation)来进行弱监督显著目标检测的单轮端到端训练方法,不需要预处理/后处理操作或者额外的监督数据。由于草图标签不能提供详细的显著区域,我们提出了一个局部一致性损失,根据图像特征与像素距离来将标签传播到未标记的区域,从而预测具有一致目标结构的整体显著区域。此外,我们设计了一个显著结构一致性损失作为自治机制,以确保在输入不同尺寸下的同一图像时,输出一致的显著图,其可以被看做一种正则化技术,来提高模型的泛化能力。此外,我们还设计了一个融合模块(AGGM),以更好地处理高级特征、低级特征与全局上下文信息,供解码器融合。大量的实验表明,我们的方法在六个基准测试上取得的了新的SOTA。


    更多相关内容
  • 针对传统背景先验方法中背景提取不精确并且背景抑制能力弱的问题,提出了全局对比和背景先验驱动的显著目标检测方法。首先将图像分割为一系列感知均匀的超像素,再由全局颜色对比得到基于全局的显著图并计算得到前景...
  • 针对该问题提出了基于对比度优化流形排序的显著目标检测算法。利用图像边界信息找出背 景先验,设计出采用显著期望、局部对比度以及全局对比度三个指标来衡量先验质量的算法,并根据先验质量设计带 权加法,代替简单...
  • 传统的显著目标检测模型通常使用手工制作的特征来制定对比度和各种先验知识,然后人为地将它们结合起来。在这项工作中,我们提出了一个新颖的基于卷积神经网络的端到端深度分层显著性网络(DHSNet),用于检测显著...

    (CVPR’15) Visual Saliency Based on Multiscale Deep Features

    Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this paper, we discover that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for feature extraction at three different scales. We then propose a refinement method to enhance the spatial coherence of our saliency results. Finally, aggregating multiple saliency maps computed for different levels of image segmentation can further boost the performance, yielding saliency maps better than those generated from a single segmentation. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotations. Experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance on all public benchmarks, improving the F-Measure by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset (HKU-IS), and lowering the mean absolute error by 5.7% and 35.1% respectively on these two datasets.

    视觉显著性是认知科学和计算科学(包括计算机视觉)的一个基本问题。在本文中,我们发现高质量的视觉显著性模型可以从使用深度卷积神经网络(CNN)提取的多尺度特征中学习,该网络在许多视觉识别任务中已经取得了成功。为了学习这样的显著性模型,我们引入了一个神经网络架构,它在多个CNN上有全连接层,负责三个不同尺度的特征提取。然后,我们提出了一种细化方法,以增强我们显著性结果的空间一致性。最后,将不同级别的图像分割计算出的多个显著图汇总起来,可以进一步提高性能,产生比单一分割产生的显著图更好的结果。为了促进对视觉显著性模型的进一步研究和评估,我们还构建了一个新的大型数据库,其中包括4447张具有挑战性的图像及其像素级的显著性标注。实验结果表明,我们提出的方法能够在所有公开基准测试上取得SOTA,在MSRA-B数据集和我们的新数据集(HKU-IS)上,F-Measure分别提高了5.0%和13.2%,在这两个数据集上,平均绝对误差(MAE)分别降低了5.7%和35.1%。


    (CVPR’15) Deep Networks for Saliency Detection via Local Estimation and Global Search

    This paper presents a saliency detection algorithm by integrating both local estimation and global search. In the local estimation stage, we detect local saliency by using a deep neural network (DNN-L) which learns local patch features to determine the saliency value of each pixel. The estimated local saliency maps are further refined by exploring the high level object concepts. In the global search stage, the local saliency map together with global contrast and geometric information are used as global features to describe a set of object candidate regions. Another deep neural network (DNN-G) is trained to predict the saliency score of each object region based on the global features. The final saliency map is generated by a weighted sum of salient object regions. Our method presents two interesting insights. First, local features learned by a supervised scheme can effectively capture local contrast, texture and shape information for saliency detection. Second, the complex relationship between different global saliency cues can be captured by deep networks and exploited principally rather than heuristically. Quantitative and qualitative experiments on several benchmark data sets demonstrate that our algorithm performs favorably against the state-of-the-art methods.

    本文通过融合局部估计和全局搜索提出了一种显著性检测算法。在局部估计阶段,我们通过使用深度神经网络(DNN-L)来检测局部显著性,该网络学习局部块特征来确定每个像素的显著值。通过探索高级的对象概念,进一步完善估计的局部显著图。在全局搜索阶段,局部显著图与全局对比度和几何信息一起被用作全局特征来描述一组对象候选区域。另一个深度神经网络(DNN-G)被训练来预测基于全局特征的每个对象区域的显著性分数。最终的显著图是由显著性对象区域的加权和产生的。我们的方法提出了两个有趣的见解。首先,通过监督方案学习的局部特征可以有效地捕捉局部对比度、纹理和形状信息,用于显著性检测。第二,不同的全局显著性线索之间的复杂关系可以被深度网络所捕捉并被利用,而非启发式地使用。在几个基准数据集上进行的定量和定性实验表明,我们的算法与SOTA相比表现良好。


    (CVPR’16) Deep Contrast Learning for Salient Object Detection

    Salient object detection has recently witnessed substantial progress due to powerful features extracted using deep convolutional neural networks (CNNs). However, existing CNN-based methods operate at the patch level instead of the pixel level. Resulting saliency maps are typically blurry, especially near the boundary of salient objects. Furthermore, image patches are treated as independent samples even when they are overlapping, giving rise to significant redundancy in computation and storage. In this paper, we propose an end-to-end deep contrast network to overcome the aforementioned limitations. Our deep network consists of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream. The first stream directly produces a saliency map with pixel-level accuracy from an input image. The second stream extracts segment-wise features very efficiently, and better models saliency discontinuities along object boundaries. Finally, a fully connected CRF model can be optionally incorporated to improve spatial coherence and contour localization in the fused result from these two streams. Experimental results demonstrate that our deep model significantly improves the state of the art.

    由于使用深度卷积神经网络(CNN)提取的强大特征,显著目标检测最近取得了实质性进展。然而,现有的基于CNN的方法是在图像块级而不是像素级进行操作。由此产生的显著图通常是模糊的,尤其是在显著性对象的边界附近。此外,图像块被视为独立的样本,即使它们是重叠的,也会在计算和存储中产生大量的冗余。在本文中,我们提出了一个端到端的深度对比网络来克服上述的局限。我们的深度网络由两个互补的部分组成,一个像素级的完全卷积流和一个分段的空间池流。第一个流直接从输入图像中产生一个具有像素级精度的显著图。第二个流有效地提取分段特征,并更好地建模对象边界上的显著性不连续现象。最后,一个全连接的CRF模型可以被选择性地引入,以改善这两个流的融合结果中的空间一致性和轮廓定位。实验结果表明,我们的深度模型明显提升了SOTA。


    (CVPR’16) DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection

    Traditional salient object detection models often use hand-crafted features to formulate contrast and various prior knowledge, and then combine them artificially. In this work, we propose a novel end-to-end deep hierarchical saliency network (DHSNet) based on convolutional neural networks for detecting salient objects. DHSNet first makes a coarse global prediction by automatically learning various global structured saliency cues, including global contrast, objectness, compactness, and their optimal combination. Then a novel hierarchical recurrent convolutional neural network (HRCNN) is adopted to further hierarchically and progressively refine the details of saliency maps step by step via integrating local context information. The whole architecture works in a global to local and coarse to fine manner. DHSNet is directly trained using whole images and corresponding ground truth saliency masks. When testing, saliency maps can be generated by directly and efficiently feedforwarding testing images through the network, without relying on any other techniques. Evaluations on four benchmark datasets and comparisons with other 11 state-of-the-art algorithms demonstrate that DHSNet not only shows its significant superiority in terms of performance, but also achieves a real-time speed of 23 FPS on modern GPUs.

    传统的显著目标检测模型通常使用手工制作的特征来制定对比度和各种先验知识,然后人为地将它们结合起来。在这项工作中,我们提出了一个新颖的基于卷积神经网络的端到端深度分层显著性网络(DHSNet),用于检测显著对象。DHSNet首先通过自动学习各种全局结构化的显著性线索,包括全局对比度、对象性、紧凑性以及它们的最佳组合,进行粗略的全局预测。然后,采用新型的分层递归卷积神经网络(HRCNN),通过整合局部上下文信息,进一步分层逐步细化显著图的细节。整个架构以全局到局部和从粗到细的方式工作。DHSNet直接使用整幅图像和相应的真值的显著性mask进行训练。在测试时,可以通过网络直接有效地前馈测试图像来生成显著图,而不需要依赖任何其他技术。对四个基准数据集的评估以及与其他11种SOTA的比较表明,DHSNet不仅在性能上显示出明显的优势,而且在现代GPU上达到了23FPS的实时速度。


    (CVPR’16) Deep Saliency with Encoded Low level Distance Map and High Level Features

    Recent advances in saliency detection have utilized deep learning to obtain high level features to detect salient regions in a scene. These advances have demonstrated superior results over previous works that utilize hand-crafted low level features for saliency detection. In this paper, we demonstrate that hand-crafted features can provide complementary information to enhance performance of saliency detection that utilizes only high level features. Our method utilizes both high level and low level features for saliency detection under a unified deep learning framework. The high level features are extracted using the VGG-net, and the low level features are compared with other parts of an image to form a low level distance map. The low level distance map is then encoded using a convolutional neural network(CNN) with multiple 1 × 1 convolutional and ReLU layers. We concatenate the encoded low level distance map and the high level features, and connect them to a fully connected neural network classifier to evaluate the saliency of a query region. Our experiments show that our method can further improve the performance of state-of-the-art deep learning-based saliency detection methods.

    显著性检测的最新进展是利用深度学习获得高级特征来检测场景中的显著区域。这些进展显示了比以前利用手工制作的低级特征进行显著性检测工作更好的结果。在本文中,我们证明了手工制作的特征可以提供补充信息,以提高只利用高级特征的显著性检测的性能。我们的方法在一个统一的深度学习框架下利用高级和低级的特征进行显著性检测。高级特征是用VGG网络提取的,而低层特征是与图像的其他部分进行比较以形成低级距离图。然后,低级距离图用一个具有多个1×1卷积层和ReLU层的卷积神经网络(CNN)进行编码。我们将编码后的低级距离图和高层特征连接起来,并将它们连接到一个全连接的神经网络分类器,以评估查询区域的显著性。实验表明,我们的方法可以进一步提高基于深度学习的显著性检测SOTA的性能。


    (CVPR’17) Non-Local Deep Features for Salient Object Detection

    Saliency detection aims to highlight the most relevant objects in an image. Methods using conventional models struggle whenever salient objects are pictured on top of a cluttered background while deep neural nets suffer from excess complexity and slow evaluation speeds. In this paper, we propose a simplified convolutional neural network which combines local and global information through a multiresolution 4 × 5 grid structure. Instead of enforcing spacial coherence with a CRF or superpixels as is usually the case, we implemented a loss function inspired by the MumfordShah functional which penalizes errors on the boundary. We trained our model on the MSRA-B dataset, and tested it on six different saliency benchmark datasets. Results show that our method is on par with the state-of-the-art while reducing computation time by a factor of 18 to 100 times, enabling near real-time, high performance saliency detection.

    显著性检测的目的是突出图像中最相关的物体。每当显著物体出现在杂乱的背景上时,使用传统模型的方法就会陷入困境,而深层神经网络则受到过于复杂和缓慢推理速度的影响。在本文中,我们提出了一个简化的卷积神经网络,它通过一个多分辨率的4×5网格结构将局部和全局信息结合起来。我们没有像通常那样用CRF或超像素来强制执行空间一致性,而是实现了一个受MumfordShah函数启发的损失函数,对边界上的错误进行惩罚。我们在MSRA-B数据集上训练了我们的模型,并在六个不同的显著性基准数据集上对其进行了测试。结果表明,我们的方法与SOTA相当,同时将计算时间减少了18到100倍,从而实现了近乎实时的高性能显著性检测。


    (CVPR’17) Deeply Supervised Salient Object Detection with Short Connections

    Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-theart results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis on the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons.

    最近在显著目标检测方面取得了重大进展,主要受益于卷积神经网络(CNN)的爆炸性发展。最近开发的语义分割和显著目标检测算法大多是基于全卷积神经网络(FCN)。与没有明确处理尺度空间问题的通用FCN模型相比,仍有很大的改进空间。整体嵌套边缘检测器(HED)为边缘和边界检测提供了一个具有深度监督的跳层结构,但HED在显著性检测上的性能提升并不明显。在本文中,我们通过在HED架构内部的跳层结构中引入短连接,提出了一种新的显著性物体检测方法。我们的框架充分利用了从FCN中提取的多层次和多尺度的特征,在每一层都提供了更高级的表征,而这一特征是进行分割和检测所迫切需要的。我们的方法在5个广泛测试的显著目标检测基准上产生了SOTA,在效率(每幅图像0.08秒)、有效性和简单性方面比现有算法更有优势。除此之外,我们对训练数据对性能的作用进行了详尽的分析。我们的实验结果为未来的研究和公平的比较提供了一个更合理和强大的训练集。


    (ICCV’17) A Stagewise Refinement Model for Detecting Salient Objects in Images

    Deep convolutional neural networks (CNNs) have been successfully applied to a wide variety of problems in computer vision, including salient object detection. To detect and segment salient objects accurately, it is necessary to extract and combine high-level semantic features with low-level fine details simultaneously. This happens to be a challenge for CNNs as repeated subsampling operations such as pooling and convolution lead to a significant decrease in the initial image resolution, which results in loss of spatial details and finer structures. To remedy this problem, here we propose to augment feedforward neural networks with a novel pyramid pooling module and a multi-stage refinement mechanism for saliency detection. First, our deep feedward net is used to generate a coarse prediction map with much detailed structures lost. Then, refinement nets are integrated with local context information to refine the preceding saliency maps generated in the master branch in a stagewise manner. Further, a pyramid pooling module is applied for different-region-based global context aggregation. Empirical evaluations over six benchmark datasets show that our proposed method compares favorably against the state-of-the-art approaches.

    深度卷积神经网络(CNN)已经成功地应用于计算机视觉中的各种问题,包括显著目标检测。为了准确地检测和分割显著物体,有必要同时提取和结合高级语义特征和低级精细细节。这对CNN来说是一个挑战,因为重复的下采样操作,如池化和卷积,会导致初始图像分辨率大幅下降,从而导致空间细节和精细结构的损失。为了弥补这个问题,我们在这里提出用一个新颖的金字塔池化模块和一个多阶段细化机制来增强前馈神经网络的显著性检测。首先,我们的深度前馈网络被用来生成一个粗略的预测图,其中的细节结构会丢失。然后,细化网与局部上下文信息相结合,以分阶段的方式细化在主分支中产生的上述显著图。此外,一个金字塔池化模块被应用于基于不同区域的全局环境聚合。对六个基准数据集的实证评估表明,我们提出的方法与SOTA相比更有优势。


    (ICCV’17) Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection

    Fully convolutional neural networks (FCNs) have shown outstanding performance in many dense labeling problems. One key pillar of these successes is mining relevant information from features in convolutional layers. However, how to better aggregate multi-level convolutional feature maps for salient object detection is underexplored. In this work, we present Amulet, a generic aggregating multi-level convolutional feature framework for salient object detection. Our framework first integrates multi-level feature maps into multiple resolutions, which simultaneously incorporate coarse semantics and fine details. Then it adaptively learns to combine these feature maps at each resolution and predict saliency maps with the combined features. Finally, the predicted results are efficiently fused to generate the final saliency map. In addition, to achieve accurate boundary inference and semantic enhancement, edge-aware feature maps in low-level layers and the predicted results of low resolution features are recursively embedded into the learning framework. By aggregating multi-level convolutional features in this efficient and flexible manner, the proposed saliency model provides accurate salient object labeling. Comprehensive experiments demonstrate that our method performs favorably against state-of-the-art approaches in terms of near all compared evaluation metrics.

    全卷积神经网络(FCN)在许多密集标签问题上表现出了出色的性能。这些成功的一个关键支柱是从卷积层的特征中挖掘相关信息。然而,如何更好地融合多级卷积特征图以进行显著目标检测还没有得到充分的探索。在这项工作中,我们提出了Amulet,一个用于显著目标检测的通用融合多级卷积特征框架。我们的框架首先将多级特征图融合到多个分辨率中,这些分辨率同时包含粗略的语义和精细的细节。然后,它自适应地学习在每个分辨率下结合这些特征图,并通过结合的特征预测显著图。最后,预测的结果被有效地融合以生成最终的显著图。此外,为了实现准确的边界推理和语义增强,低层的边缘感知特征图和低分辨率特征的预测结果被递归地嵌入到学习框架中。通过以这种高效和灵活的方式融合多级卷积特征,所提出的显著性模型提供了准确的显著性物体标签。综合实验表明,我们的方法在几乎所有比较的评价指标方面都比SOTA表现更好。


    (ICCV’17) Learning Uncertain Convolutional Features for Accurate Saliency Detection

    Deep convolutional neural networks (CNNs) have delivered superior performance in many computer vision tasks. In this paper, we propose a novel deep fully convolutional network model for accurate salient object detection. The key contribution of this work is to learn deep uncertain convolutional features (UCF), which encourage the robustness and accuracy of saliency detection. We achieve this via introducing a reformulated dropout (R-dropout) after specific convolutional layers to construct an uncertain ensemble of internal feature units. In addition, we propose an effective hybrid upsampling method to reduce the checkerboard artifacts of deconvolution operators in our decoder network. The proposed methods can also be applied to other deep convolutional networks. Compared with existing saliency detection methods, the proposed UCF model is able to incorporate uncertainties for more accurate object boundary inference. Extensive experiments demonstrate that our proposed saliency model performs favorably against state-ofthe-art approaches. The uncertain feature learning mechanism as well as the upsampling method can significantly improve performance on other pixel-wise vision tasks.

    深度卷积神经网络(CNN)在许多计算机视觉任务中都有出色的表现。在本文中,我们提出了一个新的深度全卷积网络模型,用于准确的显著目标检测。这项工作的主要贡献是学习深度不确定卷积特征(UCF),它鼓励了显著性检测的鲁棒性和准确性。我们通过在特定的卷积层之后引入一个reformulated dropout(R-dropout)来构建内部特征单元的不确定集合来实现这一目标。此外,我们提出了一种有效的混合上采样方法,以减少我们解码器网络中反卷积算子的棋盘伪像。所提出的方法也可以应用于其他深度卷积网络。与现有的显著性检测方法相比,所提出的UCF模型能够纳入不确定因素,以获得更准确的物体比边缘推断。广泛的实验表明,我们提出的显著性模型与现有的方法相比表现良好。不确定的特征学习机制以及上采样方法可以显著提高其他像素级视觉任务的性能。


    (CVPR’18) Detect Globally, Refine Locally: A Novel Approach to Saliency Detection

    Effective integration of contextual information is crucial for salient object detection. To achieve this, most existing methods based on ’skip’ architecture mainly focus on how to integrate hierarchical features of Convolutional Neural Networks (CNNs). They simply apply concatenation or element-wise operation to incorporate high-level semantic cues and low-level detailed information. However, this can degrade the quality of predictions because cluttered and noisy information can also be passed through. To address this problem, we proposes a global Recurrent Localization Network (RLN) which exploits contextual information by the weighted response map in order to localize salient objects more accurately. Particularly, a recurrent module is employed to progressively refine the inner structure of the CNN over multiple time steps. Moreover, to effectively recover object boundaries, we propose a local Boundary Refinement Network (BRN) to adaptively learn the local contextual information for each spatial position. The learned propagation coefficients can be used to optimally capture relations between each pixel and its neighbors. Experiments on five challenging datasets show that our approach performs favorably against all existing methods in terms of the popular evaluation metrics.

    有效整合上下文信息对于显著目标检测至关重要。为了实现这一点,大多数现有的基于"跳过"架构的方法主要集中在如何整合卷积神经网络(CNN)的分层特征。他们只是简单地应用串联或逐元操作来纳入高层次的语义线索和低层次的细节信息。然而,这可能会降低预测的质量,因为杂乱和噪声的信息也会被传递出去。为了解决这个问题,我们提出了一个全局性的循环定位网络(RLN),它通过加权响应图来利用上下文信息,以便更准确地定位显著物体。具体来说,一个递归模块被用来在多个时间步骤中逐步完善CNN的内部结构。此外,为了有效地恢复物体的边界,我们提出了一个局部的边界细化网络(BRN)来自适应地学习每个空间位置的局部上下文信息。学习到的传播系数可以用来最佳地捕捉每个像素和其相邻之间的关系。在五个具有挑战性的数据集上的实验表明,我们的方法在流行的评估指标方面比所有现有的方法都表现得好。


    (CVPR’18) PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection

    Contexts play an important role in the saliency detection task. However, given a context region, not all contextual information is helpful for the final task. In this paper, we propose a novel pixel-wise contextual attention network, i.e., the PiCANet, to learn to selectively attend to informative context locations for each pixel. Specifically, for each pixel, it can generate an attention map in which each attention weight corresponds to the contextual relevance at each context location. An attended contextual feature can then be constructed by selectively aggregating the contextual information. We formulate the proposed PiCANet in both global and local forms to attend to global and local contexts, respectively. Both models are fully differentiable and can be embedded into CNNs for joint training. We also incorporate the proposed models with the U-Net architecture to detect salient objects. Extensive experiments show that the proposed PiCANets can consistently improve saliency detection performance. The global and local PiCANets facilitate learning global contrast and homogeneousness, respectively. As a result, our saliency model can detect salient objects more accurately and uniformly, thus performing favorably against the state-of-the-art methods.

    上下文在显著性检测任务中起着重要作用。然而,给定一个上下文区域,并非所有的上下文信息都对最终的任务有帮助。在本文中,我们提出了一个新的像素级上下文注意网络,即PiCANet,以学习有选择地关注每个像素的有信息的上下文位置。具体来说,对于每个像素,它可以生成一个注意图,其中每个注意力权重对应于每个上下文位置的上下文相关性。然后,通过有选择地汇总上下文信息,可以构建一个被关注的上下文特征。我们提出的PiCANet有全局和局部两种形式,分别用于关注全局和局部上下文。这两种模型都是完全可分的,可以嵌入到CNN中进行联合训练。我们还将提出的模型与U-Net架构结合起来,以检测显著物体。广泛的实验表明,所提出的PiCANet可以持续改善显著性检测性能。全局和局部PiCANets别有助于学习全局对比度和同质性。因此,我们的显著性模型可以更准确、更统一地检测出显著性物体,从而在与SOTA相比时表现出优势。


    (CVPR’18) A Bi-directional Message Passing Model for Salient Object Detection

    Recent progress on salient object detection is beneficial from Fully Convolutional Neural Network (FCN). The saliency cues contained in multi-level convolutional features are complementary for detecting salient objects. How to integrate multi-level features becomes an open problem in saliency detection. In this paper, we propose a novel bi-directional message passing model to integrate multilevel features for salient object detection. At first, we adopt a Multi-scale Context-aware Feature Extraction Module (MCFEM) for multi-level feature maps to capture rich context information. Then a bi-directional structure is designed to pass messages between multi-level features, and a gate function is exploited to control the message passing rate. We use the features after message passing, which simultaneously encode semantic information and spatial details, to predict saliency maps. Finally, the predicted results are efficiently combined to generate the final saliency map. Quantitative and qualitative experiments on five benchmark datasets demonstrate that our proposed model performs favorably against the state-of-the-art methods under different evaluation metrics.

    全卷积神经网络(FCN)对显著性检测的最新进展是有益的。多级卷积特征中包含的显著性线索对于检测显著性物体是互补的。如何整合多级特征成为显著性检测的一个开放性问题。在本文中,我们提出了一个新颖的双向信息传递模型,以整合多级特征来进行显著目标检测。首先,我们采用多尺度上下文感知特征提取模块(MCFEM),用于多级特征图来捕获丰富的上下文信息。然后,我们设计了一个双向结构,在多级特征之间传递信息,并利用一个门函数来控制信息传递率。我们使用信息传递后的特征,同时编码语义信息和空间细节,来预测显著图。最后,预测的结果被有效地结合起来,生成最终的显著图。在五个基准数据集上进行的定量和定性实验表明,我们提出的模型在不同的评估指标下与SOTA相比表现良好。


    (CVPR’18) Progressive Attention Guided Recurrent Network for Salient Object Detection

    Effective convolutional features play an important role in saliency estimation but how to learn powerful features for saliency is still a challenging task. FCN-based methods directly apply multi-level convolutional features without distinction, which leads to sub-optimal results due to the distraction from redundant details. In this paper, we propose a novel attention guided network which selectively integrates multi-level contextual information in a progressive manner. Attentive features generated by our network can alleviate distraction of background thus achieve better performance. On the other hand, it is observed that most of existing algorithms conduct salient object detection by exploiting side-output features of the backbone feature extraction network. However, shallower layers of backbone network lack the ability to obtain global semantic information, which limits the effective feature learning. To address the problem, we introduce multi-path recurrent feedback to enhance our proposed progressive attention driven framework. Through multi-path recurrent connections, global semantic information from the top convolutional layer is transferred to shallower layers, which intrinsically refines the entire network. Experimental results on six benchmark datasets demonstrate that our algorithm performs favorably against the state-of-the-art approaches.

    有效的卷积特征在显著性估计中起着重要作用,但如何学习强大的显著性特征仍然是一项具有挑战性的任务。基于FCN的方法不加区分地直接应用多级卷积特征,由于受到冗余细节的干扰,导致了次优的结果。在本文中,我们提出了一种新的注意力引导网络,它以渐进的方式选择性地融合多级上下文信息。由我们的网络产生的注意力特征可以减轻背景的干扰,从而达到更好的性能。另一方面,我们发现大多数现有的算法都是通过利用主干特征提取网络的侧面输出特征来进行显著目标检测。然而,主干网络的较浅层缺乏获得全局语义信息的能力,这限制了有效的特征学习。为了解决这个问题,我们引入了多路径递归反馈来加强我们提出的渐进式注意力驱动框架。通过多路递归连接,来自高层卷积的全局语义信息被转移到较浅的层,这在本质上完善了整个网络。在六个基准数据集上的实验结果表明,我们的算法与SOTA相比表现良好。

    (ECCV’18) Contour Knowledge Transfer for Salient Object Detection

    In recent years, deep Convolutional Neural Networks (CNNs) have broken all records in salient object detection. However, training such a deep model requires a large amount of manual annotations. Our goal is to overcome this limitation by automatically converting an existing deep contour detection model into a salient object detection model without using any manual salient object masks. For this purpose, we have created a deep network architecture, namely Contour-to-Saliency Network (C2SNet), by grafting a new branch onto a well-trained contour detection network. Therefore, our C2S-Net has two branches for performing two different tasks: (1) predicting contours with the original contour branch, and (2) estimating per-pixel saliency score of each image with the newly added saliency branch. To bridge the gap between these two tasks, we further propose a contour-to-saliency transferring method to automatically generate salient object masks which can be used to train the saliency branch from outputs of the contour branch. Finally, we introduce a novel alternating training pipeline to gradually update the network parameters. In this scheme, the contour branch generates saliency masks for training the saliency branch, while the saliency branch, in turn, feeds back saliency knowledge in the form of saliency-aware contour labels, for fine-tuning the contour branch. The proposed method achieves state-of-the-art performance on five well-known benchmarks, outperforming existing fully supervised methods while also maintaining high efficiency.

    近年来,深度卷积神经网络(CNN)已经打破了显著目标检测的所有记录。然而,训练这样一个深度模型需要大量的人工标注。我们的目标是克服这一限制,将现有的深度轮廓检测模型自动转换为显著目标检测模型,而不使用任何人工显著物体mask。为此,我们创建了一个深度网络架构,即轮廓到显著性网络(C2SNet),将一个新的分支嫁接到一个训练好的轮廓检测网络上。因此,我们的C2S-Net有两个分支来执行两个不同的任务:(1)用原来的轮廓分支预测轮廓;(2)用新增加的显著性分支估计每个图像的逐像素显著性分数。为了弥补这两项任务之间的差距,我们进一步提出了一种轮廓到显著性的迁移方法,以自动生成显著性物体mask,这些mask可用于从轮廓分支的输出中训练显著性分支。最后,我们引入了一个新颖的交替训练流水线来逐步更新网络参数。在这个方案中,轮廓分支产生的显著性mask用于训练显著性分支,而显著性分支则以显著性感知的轮廓标签的形式反馈显著性知识,用于微调轮廓分支。所提出的方法在五个著名的基准上取得了最先进的性能,超过了现有的完全监督的方法,同时也保持了高效率。


    (ECCV’18) Reverse Attention for Salient Object Detection

    Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).

    受益于深度学习技术的快速发展,显著目标检测最近取得了可观的进展。然而,仍然存在以下两大挑战,阻碍了其在嵌入式设备中的应用,即低分辨率输出和庞大的模型参数。为此,本文提出了一个准确而紧凑的深度网络,用于高效的显著目标检测。更具体地说,在最深层给定一个粗略的显著性预测,我们首先采用残差学习来学习侧面输出的残差特征,以实现显著性的细化,这可以用非常有限的卷积参数同时保持准确性。其次,我们进一步提出反向注意力,以自上而下的方式指导这种侧输出的残差学习。通过从侧输出特征中抹去当前预测的显著区域,网络最终可以探索缺失的物体部分和细节,从而获得高的分辨率和准确性。在六个基准数据集上的实验表明,所提出的方法与SOTA相比更有优势,并且在简单性、效率(45 FPS)和模型大小(81 MB)方面更优。


    (CVPR’19) Attentive Feedback Network for Boundary-Aware Salient Object Detection

    Recent deep learning based salient object detection methods achieve gratifying performance built upon Fully Convolutional Neural Networks (FCNs). However, most of them have suffered from the boundary challenge. The state-of-the-art methods employ feature aggregation technique and can precisely find out wherein the salient object, but they often fail to segment out the entire object with fine boundaries, especially those raised narrow stripes. So there is still a large room for improvement over the FCN based models. In this paper, we design the Attentive Feedback Modules (AFMs) to better explore the structure of objects. A Boundary-Enhanced Loss (BEL) is further employed for learning exquisite boundaries. Our proposed deep model produces satisfying results on the object boundaries and achieves state-of-the-art performance on five widely tested salient object detection benchmarks. The network is in a fully convolutional fashion running at a speed of 26 FPS and does not need any post-processing.

    最近基于深度学习的显著目标检测方法在全卷积神经网络(FCN)的基础上取得了令人满意的性能。然而,它们中的大多数都遭受了边缘的挑战。最先进的方法采用了特征融合技术,可以精确地找到显著物体的位置,但它们往往不能用细小的边界分割出整个物体,特别是那些凸起的线条。因此,基于FCN的模型仍有很大的改进空间。在本文中,我们设计了注意力反馈模块(Attentive Feedback Module、AFM)来更好地探索物体的结构。边界增强损失(Boundary-Enhanced Loss、BEL)被进一步用于学习精细的边界。我们提出的深度模型在物体边界上产生了令人满意的结果,并在五个广泛测试的显著目标检测基准上实现了SOTA。该网络以完全卷积的方式运行,速度为26FPS,不需要任何后处理。


    (CVPR’19) Salient Object Detection With Pyramid Attention and Salient Edges

    This paper presents a new method for detecting salient objects in images using convolutional neural networks (CNNs). The proposed network, named PAGE-Net, offers two key contributions. The first is the exploitation of an essential pyramid attention structure for salient object detection. This enables the network to concentrate more on salient regions while considering multi-scale saliency information. Such a stacked attention design provides a powerful tool to efficiently improve the representation ability of the corresponding network layer with an enlarged receptive field. The second contribution lies in the emphasis on the importance of salient edges. Salient edge information offers a strong cue to better segment salient objects and refine object boundaries. To this end, our model is equipped with a salient edge detection module, which is learned for precise salient boundary estimation. This encourages better edge-preserving salient object segmentation. Exhaustive experiments confirm that the proposed pyramid attention and salient edges are effective for salient object detection. We show that our deep saliency model outperforms state-of-the-art approaches for several benchmarks with a fast processing speed (25fps on one GPU).

    本文介绍了一种使用卷积神经网络(CNN)检测图像中显著物体的新方法。所提出的网络,名为PAGE-Net,提供了两个关键的贡献。首先是利用一个基础的金字塔注意力结构来检测显著的物体。这使得网络在考虑多尺度的显著性信息的同时,能够更多地集中在显著性区域。这样的堆叠式注意力设计提供了一个强有力的工具,可以有效地提高相应网络层的表征能力,并扩大了感受野。第二个贡献在于强调了显著边缘的重要性。显著的边缘信息为更好地分割显著物体和完善物体的边界提供了强有力的线索。为此,我们的模型配备了一个显著边缘检测模块,该模块是为精确的显著边界估计而学习的。这鼓励了更好的边缘保留的显著对象分割。详尽的实验证实,所提出的金字塔注意力和显著边缘对显著目标检测是有效的。我们表明,我们的深度显著性模型在几个基准测试中以快速的处理速度(单个GPU上为25fps)胜过SOTA。


    (CVPR’19) Pyramid Feature Attention Network for Saliency detection

    Saliency detection is one of the basic challenges in computer vision. How to extract effective features is a critical point for saliency detection. Recent methods mainly adopt integrating multi-scale convolutional features indiscriminately. However, not all features are useful for saliency detection and some even cause interferences. To solve this problem, we propose Pyramid Feature Attention network to focus on effective high-level context features and low-level spatial structural features. First, we design Context-aware Pyramid Feature Extraction (CPFE) module for multi-scale high-level feature maps to capture rich context features. Second, we adopt channel-wise attention (CA) after CPFE feature maps and spatial attention (SA) after low-level feature maps, then fuse outputs of CA & SA together. Finally, we propose an edge preservation loss to guide network to learn more detailed information in boundary localization. Extensive evaluations on five benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art approaches under different evaluation metrics.

    显著性检测是计算机视觉的基本挑战之一。如何提取有效的特征是显著性检测的一个关键点。最近的方法主要是不加区分地采用融合多尺度卷积特征。然而,并非所有的特征都对显著性检测有用,有些甚至会造成干扰。为了解决这个问题,我们提出了金字塔特征注意力网络,以关注有效的高级背景特征和低级空间结构特征。首先,我们设计了上下文感知的金字塔特征提取(CPFE)模块,用于多尺度高层特征图,以捕获丰富的上下文特征。其次,我们在CPFE特征图后采用通道注意力(CA),在低层次特征图后采用空间注意力(SA),然后将CA和SA的输出融合在一起。最后,我们提出了一个边缘保留损失,以指导网络在边界定位中学习更多的细节信息。在五个基准数据集上进行的广泛评估表明,所提出的方法在不同的评估指标下优于SOTA。


    (CVPR’19) BASNet: Boundary-Aware Salient Object Detection

    Deep Convolutional Neural Networks have been adopted for salient object detection and achieved the state-of-the-art performance. Most of the previous works however focus on region accuracy but not on the boundary quality. In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection. Specifically, the architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module, which are respectively in charge of saliency prediction and saliency map refinement. The hybrid loss guides the network to learn the transformation between the input image and the ground truth in a three-level hierarchy – pixel-, patch- and map- level – by fusing Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersection-over-Union (IoU) losses. Equipped with the hybrid loss, the proposed predict-refine architecture is able to effectively segment the salient object regions and accurately predict the fine structures with clear boundaries. Experimental results on six public datasets show that our method outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures. Our method runs at over 25 fps on a single GPU. The code is available at: https://github.com/NathanUA/BASNet.

    深度卷积神经网络已被用于显著目标检测,并取得了SOTA。然而,以前的工作大多集中在区域精度上,而不是在边界质量上。在本文中,我们提出了一个预测-细化架构,BASNet,和一个新的混合损失,用于边界感知显著目标检测。具体来说,该架构由一个密集监督的编码器-解码器网络和一个残差细化模块组成,它们分别负责显著性预测和显著图的细化。混合损失指导网络学习输入图像和GT之间的转换,它分为三个层次:像素级、块级和图级,通过融合二元交叉熵(BCE)、结构相似度(SSIM)和交叉联合(IoU)损失。在混合损失的帮助下,所提出的预测-细化架构能够有效地分割突出的物体区域,并准确地预测具有清晰边界的精细结构。在六个公共数据集上的实验结果表明,我们的方法在区域和边界评估指标方面都优于SOTA。我们的方法在单个GPU上的运行速度超过25fps。代码可见:https://github.com/NathanUA/BASNet。


    (CVPR’19) Cascaded Partial Decoder for Fast and Accurate Salient Object Detection

    Existing state-of-the-art salient object detection networks rely on aggregating multi-level features of pre-trained convolutional neural networks (CNNs). Compared to high-level features, low-level features contribute less to performance but cost more computations because of their larger spatial resolutions. In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection. On the one hand, the framework constructs partial decoder which discards larger resolution features of shallower layers for acceleration. On the other hand, we observe that integrating features of deeper layers obtain relatively precise saliency map. Therefore we directly utilize generated saliency map to refine the features of backbone network. This strategy efficiently suppresses distractors in the features and significantly improves their representation ability. Experiments conducted on five benchmark datasets exhibit that the proposed model not only achieves state-of-the-art performance but also runs much faster than existing models. Besides, the proposed framework is further applied to improve existing multi-level feature aggregation models and significantly improve their efficiency and accuracy.

    现有的最先进的显著目标检测网络依赖于融合预训练的卷积神经网络(CNN)的多级特征。与高层特征相比,低层特征对性能的贡献较小,但由于其空间分辨率较大,因此计算成本较高。在本文中,我们提出了一个新颖的级联部分解码器(CPD)框架,用于快速和准确的显著目标检测。一方面,该框架构建了部分解码器,放弃了较浅层的较大分辨率特征,以达到加速的目的。另一方面,我们观察到,整合较深层的特征可以获得相对精确的显著图。因此,我们直接利用生成的显著图来完善主干网络的特征。这一策略有效地抑制了特征中的干扰因素,并极大地提高了其表征能力。在五个基准数据集上进行的实验表明,所提出的模型不仅达到了SOTA,而且比现有的模型运行得更快。此外,所提出的框架还被进一步应用于改进现有的多级特征融合模型,并显著提高其效率和准确性。


    (CVPR’19) A Simple Pooling-Based Design for Real-Time Salient Object Detection

    We solve the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks. Based on the U-shape architecture, we first build a global guidance module (GGM) upon the bottom-up pathway, aiming at providing layers at different feature levels the location information of potential salient objects. We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down pathway. By adding FAMs after the fusion operations in the top-down pathway, coarse-level features from the GGM can be seamlessly merged with features at various scales. These two pooling-based modules allow the high-level semantic features to be progressively refined, yielding detail enriched saliency maps. Experiment results show that our proposed approach can more accurately locate the salient objects with sharpened details and hence substantially improve the performance compared to the previous state-of-the-arts. Our approach is fast as well and can run at a speed of more than 30 FPS when processing a 300×400 image. Code can be found at http://mmcheng.net/poolnet/.

    我们通过研究如何扩大卷积神经网络中池化的作用来解决显著目标检测的问题。基于U型结构,我们首先在自下而上的路径上建立了一个全局引导模块(GGM),旨在为不同特征层次的层提供潜在显著对象的位置信息。我们进一步设计了一个特征聚合模块(FAM),使粗略层次的语义信息与自上而下路径的精细层次的特征很好地融合。通过在自上而下途径的融合操作之后添加FAM,来自GGM的粗糙特征可以与各种尺度的特征无缝融合。这两个基于池化的模块允许高层次的语义特征被逐步细化,产生细节丰富的显著图。实验结果表明,我们提出的方法可以更准确地定位具有精细细节的显著对象,因此与以前的SOTA相比,性能得到了大幅提高。我们的方法也很快速,在处理300×400的图像时可以以超过30FPS的速度运行。代码可以在http://mmcheng.net/poolnet/找到。


    (ICCV’19) Stacked Cross Refinement Network for Edge-Aware Salient Object Detection

    Salient object detection is a fundamental computer vision task. The majority of existing algorithms focus on aggregating multi-level features of pre-trained convolutional neural networks. Moreover, some researchers attempt to utilize edge information for auxiliary training. However, existing edge-aware models design unidirectional frameworks which only use edge features to improve the segmentation features. Motivated by the logical interrelations between binary segmentation and edge maps, we propose a novel Stacked Cross Refinement Network (SCRN) for salient object detection in this paper. Our framework aims to simultaneously refine multi-level features of salient object detection and edge detection by stacking Cross Refinement Unit (CRU). According to the logical interrelations, the CRU designs two direction-specific integration operations, and bidirectionally passes messages between the two tasks. Incorporating the refined edge-preserving features with the typical U-Net, our model detects salient objects accurately. Extensive experiments conducted on six benchmark datasets demonstrate that our method outperforms existing state-of-the-art algorithms in both accuracy and efficiency. Besides, the attribute-based performance on the SOC dataset show that the proposed model ranks first in the majority of challenging scenes. Code can be found at https://github.com/wuzhe71/SCAN.

    显著目标检测是一项基本的计算机视觉任务。现有的大多数算法都集中在融合预先训练好的卷积神经网络的多层次特征上。此外,一些研究人员试图利用边缘信息进行辅助训练。然而,现有的边缘感知模型设计的是单向框架,只利用边缘特征来改进分割特征。在二元分割和边缘图之间的逻辑关系的激励下,我们在本文中提出了一个新颖的重叠交叉细化网络(SCRN)用于显著目标检测。我们的框架旨在通过堆叠交叉细化单元(CRU)同时细化显著目标检测和边缘检测的多层次特征。根据逻辑上的相互关系,CRU设计了两个特定方向的融合操作,并在两个任务之间双向传递信息。将细化的边缘保留特征与典型的U-Net相结合,我们的模型能够准确地检测出显著对象。在六个基准数据集上进行的广泛实验表明,我们的方法在准确性和效率方面都优于SOTA。此外,在SOC数据集上的基于属性的表现表明,所提出的模型在大多数具有挑战性的场景中排名第一。代码可以在https://github.com/wuzhe71/SCAN中找到。


    (ICCV’19) Selectivity or Invariance: Boundary-aware Salient Object Detection

    Typically, a salient object detection (SOD) model faces opposite requirements in processing object interiors and boundaries. The features of interiors should be invariant to strong appearance change so as to pop-out the salient object as a whole, while the features of boundaries should be selective to slight appearance change to distinguish salient objects and background. To address this selectivity-invariance dilemma, we propose a novel boundary-aware network with successive dilation for image-based SOD. In this network, the feature selectivity at boundaries is enhanced by incorporating a boundary localization stream, while the feature invariance at interiors is guaranteed with a complex interior perception stream. Moreover, a transition compensation stream is adopted to amend the probable failures in transitional regions between interiors and boundaries. In particular, an integrated successive dilation module is proposed to enhance the feature invariance at interiors and transitional regions. Extensive experiments on six datasets show that the proposed approach outperforms 16 state-of-the-art methods.

    通常情况下,显著目标检测(SOD)模型在处理物体内部和边界时面临相反的要求。内部的特征应该对强烈的外观变化保持不变,以便将显著的物体作为一个整体给体现出来,而边界的特征应该对微小的外观变化具有选择性,以区分显著的对象和背景。为了解决这种选择性-不变性的困境,我们提出了一种新型的边界感知网络,该网络具有基于图像的SOD的逐级膨胀功能。在这个网络中,通过加入边界定位流来提高边界的特征选择性,而通过内部感知流来保证内部的复杂特征不变性。此外,还采用了一个过渡补偿流来修正内部和边界之间的过渡区域可能出现的故障。特别是,提出了一个综合的连续膨胀模块,以提高内部和过渡区域的特征不变性。在六个数据集上进行的广泛实验表明,所提出的方法优于16种SOTA。


    (ICCV’19) EGNet:Edge Guidance Network for Salient Object Detection

    Fully convolutional neural networks (FCNs) have shown their advantages in the salient object detection task. However, most existing FCNs-based methods still suffer from coarse object boundaries. In this paper, to solve this problem, we focus on the complementarity between salient edge information and salient object information. Accordingly, we present an edge guidance network (EGNet) for salient object detection with three steps to simultaneously model these two kinds of complementary information in a single network. In the first step, we extract the salient object features by a progressive fusion way. In the second step, we integrate the local edge information and global location information to obtain the salient edge features. Finally, to sufficiently leverage these complementary features, we couple the same salient edge features with salient object features at various resolutions. Benefiting from the rich edge information and location information in salient edge features, the fused features can help locate salient objects, especially their boundaries more accurately. Experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods on six widely used datasets without any pre-processing and post-processing. The source code is available at http://mmcheng.net/egnet/.

    全卷积神经网络(FCN)在显著目标检测任务中显示了其优势。然而,大多数现有的基于FCN的方法仍然受到粗糙的对象边界的影响。在本文中,为了解决这个问题,我们把重点放在显著的边缘信息和显著的对象信息之间的互补性。因此,我们提出了一个用于显著目标检测的边缘引导网络(EGNet),通过三个步骤在一个网络中同时模拟这两种互补的信息。在第一步中,我们通过渐进式融合的方式提取显著目标特征。第二步,我们融合局部边缘信息和全局位置信息以获得显著的边缘特征。最后,为了充分地利用这些互补的特征,我们将相同的显著边缘特征与不同分辨率的显著目标特征相结合。受益于显著边缘特征中丰富的边缘信息和位置信息,融合后的特征可以帮助定位显著目标,尤其是它们的边界更加准确。实验结果表明,所提出的方法在六个广泛使用的数据集上的表现优于SOTA,不需要任何预处理和后处理。源代码可在http://mmcheng.net/egnet/找到。


    (ICCV’19) Employing Deep Part-Object Relationships for Salient Object Detection

    Despite Convolutional Neural Networks (CNNs) based methods have been successful in detecting salient objects, their underlying mechanism that decides the salient intensity of each image part separately cannot avoid inconsistency of parts within the same salient object. This would ultimately result in an incomplete shape of the detected salient object. To solve this problem, we dig into part-object relationships and take the unprecedented attempt to employ these relationships endowed by the Capsule Network (CapsNet) for salient object detection. The entire salient object detection system is built directly on a Two-Stream Part-Object Assignment Network (TSPOANet) consisting of three algorithmic steps. In the first step, the learned deep feature maps of the input image are transformed to a group of primary capsules. In the second step, we feed the primary capsules into two identical streams, within each of which low-level capsules (parts) will be assigned to their familiar high-level capsules (object) via a locally connected routing. In the final step, the two streams are integrated in the form of a fully connected layer, where the relevant parts can be clustered together to form a complete salient object. Experimental results demonstrate the superiority of the proposed salient object detection network over the state-of-the-art methods.

    尽管基于卷积神经网络(CNN)的方法在检测显著目标方面取得了成功,但其决定每个图像部分的显著程度的基本机制无法避免同一显著目标中各部分的不一致情况。这最终会导致检测到的显著对象的形状不完整。为了解决这个问题,我们挖掘了部分与整体之间的关系,并史无前例地尝试采用胶囊网络(CapsNet)所赋予的这些关系进行显著目标检测。整个显著目标检测系统直接建立在双流部分对象分配网络(TSPOANet)上,包括三个算法步骤。在第一步中,学习到的输入图像的深度特征图被转换为一组主要的胶囊。在第二步中,我们将初级胶囊送入两个相同的流中,在每个流中,低级胶囊(部分对象)将通过局部连接的路由分配给它们熟悉的高级胶囊(完整对象)。在最后一步,这两个流以全连接层的形式被融合,其中相关的部分可以被集中在一起,形成一个完整的显著对象。实验结果表明,所提出的显著目标检测网络比SOTA更有优势。


    (CVPR’20) Interactive Two-Stream Decoder for Accurate and Fast Saliency Detection

    Recently, contour information largely improves the performance of saliency detection. However, the discussion on the correlation between saliency and contour remains scarce. In this paper, we first analyze such correlation and then propose an interactive two-stream decoder to explore multiple cues, including saliency, contour and their correlation. Specifically, our decoder consists of two branches, a saliency branch and a contour branch. Each branch is assigned to learn distinctive features for predicting the corresponding map. Meanwhile, the intermediate connections are forced to learn the correlation by interactively transmitting the features from each branch to the other one. In addition, we develop an adaptive contour loss to automatically discriminate hard examples during learning process. Extensive experiments on six benchmarks well demonstrate that our network achieves competitive performance with a fast speed around 50 FPS. Moreover, our VGG-based model only contains 17.08 million parameters, which is significantly smaller than other VGG-based approaches. Code has been made available at: https://github.com/moothes/ITSD-pytorch.

    最近,轮廓信息在很大程度上提高了显著性检测的性能。然而,关于显著性和轮廓之间的相关性的讨论仍然很少。在本文中,我们首先分析了这种相关性,然后提出了一个交互式双流解码器来探索多种线索,包括显著性、轮廓和它们的相关性。具体来说,我们的解码器由两个分支组成,一个是显著性分支,一个是轮廓分支。每个分支都被指定学习独特的特征来预测相应的图。同时,中间连接强迫通过交互式地将每个分支的特征传递给另一个分支来学习相关的内容。此外,我们还开发了一个自适应的轮廓损失,以便在学习过程中自动分辨出困难的样本。在六个基准上进行的广泛实验表明,我们的网络以50FPS左右的速度实现了有竞争力的性能。此外,我们基于VGG的模型只包含1708万个参数,这比其他基于VGG的方法小得多。代码可在https://github.com/moothes/ITSD-pytorch找到。


    (CVPR’20) Multi-scale Interactive Network for Salient Object Detection

    Deep-learning based salient object detection methods achieve great progress. However, the variable scale and unknown category of salient objects are great challenges all the time. These are closely related to the utilization of multi-level and multi-scale features. In this paper, we propose the aggregate interaction modules to integrate the features from adjacent levels, in which less noise is introduced because of only using small up-/down-sampling rates. To obtain more efficient multi-scale features from the integrated features, the self-interaction modules are embedded in each decoder unit. Besides, the class imbalance issue caused by the scale variation weakens the effect of the binary cross entropy loss and results in the spatial inconsistency of the predictions. Therefore, we exploit the consistency-enhanced loss to highlight the fore-/back-ground difference and preserve the intra-class consistency. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches. The source code will be publicly available at this https https://github.com/lartpang/MINet.

    基于深度学习的显著目标检测方法取得了很大进展。然而,显著目标的可变尺度和未知类别一直是巨大的挑战。这些都与多层次、多尺度特征的利用密切相关。在本文中,我们提出了融合交互模块来整合相邻层次的特征,由于只使用小的上/下采样率,所以引入的噪声较少。为了从融合的特征中获得更有效的多尺度特征,自交互模块被嵌入到每个解码器单元中。此外,由尺度变化引起的类不平衡问题削弱了二元交叉熵损失的效果,导致预测的空间不一致。因此,我们利用一致性增强的损失来突出前/后景的差异并保持类内的一致性。在五个基准数据集上的实验结果表明,所提出的方法在没有任何后处理的情况下,与23种SOTA相比表现良好。源代码将在以下网址公开:https://github.com/lartpang/MINet。


    (CVPR’20) Label Decoupling Framework for Salient Object Detection

    To get more accurate saliency maps, recent methods mainly focus on aggregating multi-level features from fully convolutional network (FCN) and introducing edge information as auxiliary supervision. Though remarkable progress has been achieved, we observe that the closer the pixel is to the edge, the more difficult it is to be predicted, because edge pixels have a very imbalance distribution. To address this problem, we propose a label decoupling framework (LDF) which consists of a label decoupling (LD) procedure and a feature interaction network (FIN). LD explicitly decomposes the original saliency map into body map and detail map, where body map concentrates on center areas of objects and detail map focuses on regions around edges. Detail map works better because it involves much more pixels than traditional edge supervision. Different from saliency map, body map discards edge pixels and only pays attention to center areas. This successfully avoids the distraction from edge pixels during training. Therefore, we employ two branches in FIN to deal with body map and detail map respectively. Feature interaction (FI) is designed to fuse the two complementary branches to predict the saliency map, which is then used to refine the two branches again. This iterative refinement is helpful for learning better representations and more precise saliency maps. Comprehensive experiments on six benchmark datasets demonstrate that LDF outperforms state-of-the-art approaches on different evaluation metrics.

    为了得到更准确的显著图,最近的方法主要集中在从全卷积网络(FCN)中融合多级特征,并引入边缘信息作为辅助监督。虽然已经取得了显著的进展,但我们观察到,越是靠近边缘的像素,越是难以被预测,因为边缘像素的分布非常不平衡。为了解决这个问题,我们提出了一个标签解耦框架(LDF),它由一个标签解耦(LD)程序和一个特征交互网络(FIN)组成。标签解耦明确地将原始显著图分解为主体图和细节图,其中主体地图集中在物体的中心区域,细节图集中在边缘区域。细节图的效果更好,因为它涉及的像素比传统的边缘监督多得多。与显著图不同的是,主体图抛弃了边缘像素,只关注中心区域。这成功地避免了训练过程中边缘像素的干扰。因此,我们在FIN中采用了两个分支,分别处理主体图和细节图。特征交互(FI)的设计是为了融合这两个互补的分支来预测显著图,然后再利用这两个分支再次进行细化。这种迭代式的细化有助于学习更好的表征和更精确的显著图。在六个基准数据集上进行的综合实验表明,LDF在不同的评价指标上优于SOTA。


    (ECCV’20) Suppress and Balance: A Simple Gated Network for Salient Object Detection

    Most salient object detection approaches use U-Net or feature pyramid networks (FPN) as their basic structures. These methods ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control between them, the other is without considering the disparity of the contributions of different encoder blocks. In this work, we propose a simple gated network (GateNet) to solve both issues at once. With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder. We design a novel gated dual branch structure to build the cooperation among different levels of features and improve the discriminability of the whole network. Through the dual branch design, more details of the saliency map can be further restored. In addition, we adopt the atrous spatial pyramid pooling based on the proposed “Fold” operation (Fold-ASPP) to accurately localize salient objects of various scales. Extensive experiments on five challenging datasets demonstrate that the proposed model performs favorably against most state-of-the-art methods under different evaluation metrics.

    大多数显著目标检测方法使用U-Net或特征金字塔网络(FPN)作为其基本结构。这些方法忽略了编码器与解码器交换信息时的两个关键问题:一个是它们之间缺乏干扰控制,另一个是没有考虑不同编码器块的贡献的不一致性。在这项工作中,我们提出一个简单的门控网络(GateNet)来同时解决这两个问题。在多级门控单元的帮助下,来自编码器的有价值的上下文信息可以最优化地传输给解码器。我们设计了一个新颖的门控双分支结构,以建立不同层次特征之间的合作,提高整个网络的可辨别性。通过双分支的设计,可以进一步恢复显著图的更多细节。此外,我们还采用了基于所提出的"折叠"操作(Fold-ASPP)的空洞空间金字塔集合,以准确定位不同尺度的显著对象。在五个具有挑战性的数据集上进行的广泛实验表明,所提出的模型在不同的评估指标下与大多数SOTA相比表现良好。


    (ECCV’20) Highly Efficient Salient Object Detection with 100K Parameters

    Salient object detection models often demand a considerable amount of computation cost to make precise prediction for each pixel, making them hardly applicable on low-power devices. In this paper, we aim to relieve the contradiction between computation cost and model performance by improving the network efficiency to a higher degree. We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features, while reducing the representation redundancy by a novel dynamic weight decay scheme. The effective dynamic weight decay scheme stably boosts the sparsity of parameters during training, supports learnable number of channels for each scale in gOctConv, allowing 80% of parameters reduce with negligible performance drop. Utilizing gOctConv, we build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% parameters (100k) of large models on popular salient object detection benchmarks.

    显著目标检测模型通常需要相当多的计算成本来对每个像素进行精确预测,这使得它们很难在低功耗设备上适用。在本文中,我们旨在通过提高网络效率来缓解计算成本和模型性能之间的矛盾。我们提出了一个灵活的卷积模块,即广义的OctConv(gOctConv),以有效地利用阶段内和跨阶段的多尺度特征,同时通过一个新的动态权重衰减方案减少表征的冗余。有效的动态权重衰减方案在训练过程中稳定地提高了参数的稀疏性,使得gOctConv中每个尺度的可学习通道数量,允许80%的参数减少而性能下降可以忽略不计。利用gOctConv,我们建立了一个极其轻量的模型,即CSNet,它在流行的显著目标检测基准上以大约0.2%的参数(100k)实现了与大型模型相当的性能。

    展开全文
  • 基于一种改进的跨层级特征融合的循环全卷积神经网络, 提出了一种结合深度学习的图像显著目标检测算法。通过改进的深度卷积网络模型对输入图像进行特征提取, 利用跨层级联合框架进行特征融合, 生成了高层语义特征的...
  • 人工智能-目标检测-显著目标检测与提取研究.pdf
  • 人工智能-目标检测-显著目标检测方法及其应用研究.pdf
  • 针对具有杂乱背景图像的显著目标检测问题,提出了一种无需任何先验知识,通过分析计算区域平均显著值的对比度来提取显著目标的方法.根据显著图,计算出显著目标的最小边界框与其周围区域的显著性差异,且通过折半...
  • 基于最小生成树实现的显著目标实时检测,是文章Real-Time Salient Object Detection with a Minimum Spanning Tree的源码
  • 人工智能-目标检测-夜间场景下显著目标检测方法研究.pdf
  • 人工智能-目标检测-特征融合的显著目标检测方法研究.pdf
  • 人工智能-目标检测-基于区域特征融合的显著目标检测研究.pdf
  • 人工智能-目标检测-基于傅里叶变换的显著目标检测方法研究.pdf
  • 并结合像素的“Center-Surround冶模型和核密度估计,提出一种能由粗到精逐步感知和获取视场中视觉显著性目标位置及尺度的实时显著目标检测算法,称其为基于贝叶斯框架的显著目标检测. 通过在微软MSRA数据集上进行ROC和...
  • 全局低秩显著性检测算法首先根据自然图像前景目标和背景亮度、颜色的差异性重构出图像前景显著目标;然后利用低秩分解对图像中的非显著性区域进行抑制。
  • 人工智能-目标检测-基于先验融合和流形排序的显著目标检测.pdf
  • 人工智能-目标检测-基于视觉感知机制的显著目标检测算法研究.pdf
  • 人工智能-目标检测-基于深度卷积神经网络的显著目标检测方法.pdf
  • 人工智能-目标检测-基于点-集度量学习的显著目标检测.pdf
  • 人工智能-目标检测-基于生物视觉机制的视频显著目标检测算法研究.pdf
  • 人工智能-目标检测-基于人类视觉注意机制的显著目标检测与分割.pdf
  • 学习笔记–深度学习时代的显著目标检测综述 这篇文章作者引用了182篇参考文献,撰写正文16页,堪称显著目标检测领域综述的良心之作。本文系论文学习笔记。 1 引言 文章开篇作者首先介绍了显著目标检测的起源与发展,...

    学习笔记–深度学习时代的显著目标检测综述

    这篇文章作者引用了182篇参考文献,撰写正文16页,堪称显著目标检测领域综述的良心之作。本文系论文学习笔记。

    1 引言

    文章开篇作者首先介绍了显著目标检测的起源与发展,然后对先前有关的综述文章做以总结。本文的贡献主要为:
    从多个角度系统回顾深度SOD模型
    创新一种基于属性的深度SOD模型性能评估方法
    讨论输入扰动的影响
    研究对抗攻击对SOD模型的影响
    交叉数据集泛化研究
    对一些开放性问题和未来的研究方向的概述

    2 基于深度学习的SOD模型

    作者从四个维度分析54个深度SOD模型:SOD典型网络结构、从监督层级看SOD、从学习范式看SOD、目标级别与实例级别的SOD。作者将表中列举的诸多SOD模型按照上述分类方法分类,粗略按照时间顺序进行逐个进行阐述。
    在这里插入图片描述

    2.1 SOD典型网络结构

    SOD典型网络结构按照时间顺序经历了多层感知机、全卷积网络、混合网络、胶囊网络的四个发展阶段,全卷积网络是主流方向。
    在这里插入图片描述
    多层感知机方法将图像处理为多个超像素、块(MCDL、ELD、MDF、SuperCNN)和通用目标区域(LEGS、MAP、SSD)单元,然后为每一单元训练一个多层感知机为该区域打分,计出显著预测值。
    多层感知机网络虽然能实现优于非深度模型的结果,但是由于其需要挨个处理每一个图像子单元,因此其时间花费是相当大的,而且图像被分割为多个子单元,图像像素间的关键空间信息就会被遗失掉,不利于模型的性能提升。借鉴全卷积网络(FCN)在语义分割领域取得的优异成绩,修改流行的VGGNet和ResNet分类网络,现行的FCN模型,大致可以总结为单流网络(RFCN、RACDNN、DLS、UCF、DUS、LICNN、SuperVAE)、多流网络(MSRNet、SRM、FSN、HRSOD、DEF)、边融合网络(DSS、NLDF、Amulet、DSOS、RADF、RSDNet-R、CPD、MWS、EGNet)、自底向上自顶向下网络(DHSNet、SBF、BDMP、RLN、PAGR、ASNet、PiCANet、RAS、AFNet、BASNe、MLSLNet、PAGE-Net、PoolNet、PS、JDFPR)和分支网络(SU、WSS、ASMO、C2S-Net、CapSal、BANet、SCRN、SSNet)。
    混合网络是将FCN子网络与多层感知机进行融合,产生基于多尺度上下文的边缘检测(DCL、CRPSD)。
    胶囊网络是由Hinton等人提出的新型网络,Y. Liu 和Q. Qi 等人将胶囊网络应用于SOD检测。(TSPOANet)

    2.2 从监督层级看SOD

    基于是否使用人工标注的显著真值图进行训练,深度SOD方法可以分为全监督、无监督、弱监督方法。
    全监督训练方法一方面数据标签需要耗费大量的时间和精力,另一方面在精细标注的真值图上训练出来的模型存在过拟合以及现实场景的泛化能力差的问题。
    无/弱监督训练过程中不适用具体的真值图,可以避免进行手工标记真值图的工作,目前主要利用图像层级分类标签(WSS、LICNN、SuperVAE)或者伪像素显著标签实施(SBF、ASMO、DUS、C2S-Net、MWS)。

    2.3 从学习范式看SOD

    从学习范式的角度来看,SOD可以分为单任务学习和多任务学习。
    机器学习中,标准的方法就是单任务学习,一次学一个任务,绝大多数SOD模型都是采用的这种学习方法。利用某一领域知识监督训练SOD模型。
    人类可以在已有的相关经验的基础上,学习处理新的任务。那机器是否也可以如此呢?由此产生多任务学习,结合相关任务的训练信号中的特定信息,提高了模型的泛化能力。多任务学习可以解决巨量参数模型训练的数据匮乏问题。多任务结合常有:显著目标感数(MAP、DSOS、RSDNet)、注视点预测(SU、ASNet)、图像分类(WSS、ASMO、)、噪音模式建模(DUS、)、语义分割(RFCN、SSNet、)、轮廓/边缘检测(NLDF、C2S-Net、AFNet、MLSLNet、PAGE-Net、PoolNet、BANet、EGNet、SCRN)、图像字幕(CapSal)

    2.4 目标级别与实例级别的SOD

    目标级SOD方法输出的预测图只标记每个像素的显著性而不区分不同的目标。而实例级(MAP、MSRNet)既标注每个像素的显著性又区分每个目标。
    大多数SOD方法都是目标级别的,只检测像素显著性不关注单独的实例。
    实例级SOD方法产生带有明确对象标签的显著性掩模,对显著性区域进行更详细的解析。对许多需要更细差别的实际应用程序而言,实例级信息是至关重要的。

    3 SOD数据集

    作者分析了自2007年至2019年间用于SOD检测的19个数据集。

    在这里插入图片描述

    4评估指标

    这一部分作者主要介绍了7种常用的评估指标:Precision-Recall (PR), F-measure, Mean Absolute Error (MAE), Weighted Fβ measure (Fbw), Structural measure (S-measure), Enhanced-alignment measure (E-measure), Salient Object Ranking (SOR).更加详细的公式以及计算详见论文1

    5基准分析

    这部分从基准结果的性能总览、基于属性的评估、输入扰动影响分析、对抗攻击分析、数据集交叉泛化评估5个方面展开。
    基准结果的性能总览:在6个数据集上,采用max-Fmeasure, S-measure, MAE三个属性,评估了47种SOD模型。

    在这里插入图片描述
    基于属性的评估:在分析之前,作者首先对分析使用的模型、数据集以及属性做了介绍。
    在这里插入图片描述
    作者选择了6个模型(3个非深度模型,3个深度模型),并随机从6个SOD数据集中个选出300张图片,组成一个1800张的数据库,进行各项分析。属性分析主要是从显著目标分类,挑战和场景分类三方面进行的,每一方面又分几类,具体见上表。从请注意,这些属性不是互斥的。每一类别中的数字代表该类别在总数据集中的比例。具体分析表6,是按照最后两行,每一栏进行的。具体作者的分析详见论文。
    在这里插入图片描述
    除了就Max-F进行分析,作者还就F-Measure进行了分析。
    在这里插入图片描述
    作者将扰动分为随机输入扰动和故意设计的对抗攻击输入。
    输入扰动影响分析:这一部分的分析主要体现在表8中,噪音有Gaussian blur, Gaussian noise, Rotation, Gray四类,其中前三种根据参数不同各有两类。整体而言非深度方法比深度方法的鲁棒性更强,主要受益于人工超像素层级特征的鲁棒性。针对每一种噪音具体分析,非深度模型的性能,基本不受角度变换的影响,对强Gaussian noise十分敏感。深度方法对Gaussian blur 和强 Gaussian noise敏感,主要是因为这两类噪声会影响浅层网络的感受野。
    在这里插入图片描述
    对抗攻击分析:对抗攻击是指为图片添加人眼看不出明显差别的噪声,但会导致机器识别错误。这项研究在分类任务中已经广泛开展开来,但在SOD领域还有待开发。这一部分,作者在三个典型的深度模型上开展的,分析了SOD模型的鲁棒性(表9对角线)和网络间的可转移性(表9非对角线)。测试依旧是在混合数据集上进行的。将SOD视作特殊的语义分割,作者采用了语义分割中的对抗攻击算法DAG,通过定性与定量分析,可以看出,一点对抗攻击都会引起巨大的性能下降,通常与随机施加的噪音相比,这些对抗的例子会导致更糟糕的预测。可转移性是指针对一个模型生成的对抗样本在不进行任何修改的情况下误导另一个模型的能力。结果表明,DAG攻击很少在不同的SOD网络之间转移。这可能是因为在不同的SOD模型中,攻击的空间分布非常不同。
    在这里插入图片描述
    数据集交叉泛化评估:在这部分作者首先介绍了设计的编解码网络,然后就研究结果进行分析。网络的实施细节见文献。这一部分最值得借鉴的是数据集交叉的思路。
    在这里插入图片描述
    考虑到作者选用的6个数据集中,ECSSD的图片数最少(1000),所以模型在不同数据集上训练时随机选取1000张,800张用于训练,200张用于验证。下表是一些分析结果。按列看,表示所有模型在同一个数据集上的性能,可以反映该数据集图片检测的难度;SOC最难,MSRA10K最简单,通过比较最后一行的Mean others.
    按行看,一个训练模型在不同数据集上的性能,反映出该模型的泛化能力。MSRA10K泛化能力最差,最高行下降百分比>0,且最大;DUTS拥有最好的泛化能力,最低行下降百分比<0,且绝对值最大。下降比计算公式:
    在这里插入图片描述
    在这里插入图片描述

    6讨论

    讨论主要围绕SOD模型设计、数据集收集、显著性排名与相对显著性、与注视点的关联、语义SOD、无/弱监督训练SOD、SOD在现实场景中的应用这7个方面展开的。
    在进行SOD模型设计应该多多从特征集合、损失函数、网络拓扑、动态推理结构四个角度思考问题。深度模型最大的优势在于可以提取比传统方法丰富千百倍的特征,然而网络 不同层提取的特征如何有效融合将直接影响SOD模型的预测结果,原文给出了目前常见的特征融合策略:多流/多分辨率融合,自顶向下自底向上融合,边输出融合,其他研究领域相关特征融合(注视点、语义分割)等。针对损失函数的设计,研究人员或将SOD的评估指标写入损失函数,或直接利用MIoU.网络拓扑直接影响网络的训练难度和参数量,诸多实验表明ResNet做基础网络骨架够贱的网络往往优于VGG.在网络拓扑设计这个角度,AutoML将会是一个很有前景的研究方向。动态推理结构主要用于降低网络参数并最大限度地保持网络性能。动态推理结构可以理解为选择网络激活部分输出特征或者实现早期停止,静态方法主要有卷积核分解、网络修剪。
    SOD数据集收集作者考虑了现有数据及与现实世界存在数据选择的偏差(理论数据集较理想,背景单一,每张图必然存在显著目标),不同数据集之间相同目标的标签设计规则不一致,标签的粗细程度不同,具体领域数据集四个点。
    显著性排名与相对显著性这一小节讨论多显著性目标和显著性共存物体或区域的显著性的问题,作者总结相关的解决办法有显著目标排名和多观测器投票。
    在与注视点的关联这一小节,作者讨论了两者的相似性和差异性,列举了一些两者联合进行SOD开发的工作。
    语义SOD使用语义分割数据集预训练SOD模型或多任务并行训练SOD和语义分割模型的方式,将语义分割信息用于SOD。
    无/弱监督训练SOD主要解决全监督高成本和时间消耗的问题,在研究和实际应用中具有很大的应用价值,需要记住分类级别的标签或者伪像素标签。
    SOD在现实场景中的应用,为了满足移动和嵌入式应用程序的需求,需要更简单、更轻的网络架构,可以利用模型压缩或知识蒸馏等技术。

    展开全文
  • 人工智能-目标检测-基于极限学习机与目标候选子空间优化的显著目标检测.pdf
  • 人工智能-目标检测-基于视觉注意机制的显著目标检测与提取算法研究.pdf
  • 人工智能-目标检测-车辆行驶中的视觉显著目标检测及语义分析研究.pdf
  • 关注公众号,发现CV技术之美▊引言最近基于深度学习的显著目标检测方法取得了出色的性能。然而现有的大多数方法多事基于低分辨率输入设计的,这些模型在高分辨率图片上的表现不尽人意,这是由于网络的采样深度和感受...

    关注公众号,发现CV技术之美

     引言

    最近基于深度学习的显著目标检测方法取得了出色的性能。然而现有的大多数方法多事基于低分辨率输入设计的,这些模型在高分辨率图片上的表现不尽人意,这是由于网络的采样深度和感受野范围之间的矛盾所导致的。

    为了缓解这一矛盾,我们提出了一个新颖的单阶段架构名叫金字塔嫁接网络(PGNet),使用transformer和CNN骨干网络从不同分辨率图像中独立地提取特征,然后将特征信息从transformer分支嫁接到CNN分支。

    同时我们提供了一个新的超高分辨率显著目标检测数据集(UHRSD),包含了5,920张4K-8K分辨率的图片及其像素级标注。这是我们所知的目前规模最大分辨率最高的显著目标检测数据集,希望可以为未来高分辨率分割任务的研究提供帮助。大量实验表明,我们的方法简单高效地在高分辨率显著检测任务上取得了良好的表现。

    88ebc7bf8aa82704b7b83d2b0b50ad87.gif

    8e1ea8941b2dd0a2f8a140785a78222c.gif

    f64182571abd53be6ba00b49d15ccf65.gif

     1.论文、代码和数据集下载链接

    38dd278f8c1ed65d7b198977279a8b4f.png

    论文地址:https://arxiv.org/abs/2204.05041

    数据集地址:

    https://drive.google.com/drive/folders/1u3K65AaKh78P5qKXTsMjVI1SvBXNAPFk?usp=sharing

    代码地址:https://github.com/iCVTEAM/PGNet

     2.研究动机

    人类的视觉系统具有从复杂场景中快速、准确地定位感兴趣物体或区域的能力,称为选择性注意力机制。显著物体检测(Salient Object Detection, SOD)是对该机制的一种模拟,旨在分割给定图像中最具视觉吸引力的物体或区域。大多数现有的SOD方法在一个特定的输入分辨率范围内表现的很好(例如224×224,384×384)。

    但随着图像采集设备的快速更新,获取到的图片分辨率也随之急速增长,高分辨率图像(例如1080P,2K,4K图像)在日常生活中很容易被获取。然而,这些日常获取到的图像显然超出了现有模型可处理的分辨率范围。

    一些现有的方法已经开始关注高分辨率输入导致的问题。但这些方法都是多阶段的。在不同阶段以不同输入分辨率处理图像:在第一阶段以低分辨率输入对全局语义进行获取并得到初步的模糊预测结果;

    在第二阶段,使用第一阶段得到的粗略结果以及高分辨率输入,通过轻量级的网络保证分辨率的同时避免巨大的计算消耗对第一阶段得到的模糊结果进行细化得到最终的结果。但这样也带来了新的问题,例如多阶段导致的推理速度变慢以及优化相对困难等。

    148eb9f11dee3125ff3ddf06d88929d7.png

    图1:不同结构对比。(a) 输入图片 (b) 真值标签 (c)高分辨率直接输入卷积网络的结果 (d)下采样后输入Swin-FPN结果 (e) 我们方法的结果

    我们认为单一深度的网络不能解决感受野和高分辨率细节同时保留的矛盾,因此我们提出分别以不同的输入分辨率提取两组特征然后将信息从一个分支嫁接到另一分支。

    为此,我们重新思考了双分支的架构并设计了一个新颖的单阶段深度网络金字塔嫁接网络(Pyramid Grafting Network, PGNet)来解决高分辨率显著性的问题。

    我们使用了ResNet和Transformer作为我们的特征提取器,并行地提取不同空间大小的特征。Transformer分支首先以特征金字塔的形式对提取到的特征进行解码,然后在他们两个分支特征大小相近的位置将全局语义信息传递给ResNet分支,我们将这一过程成为特征嫁接。

    最终,ResNet分支完成整个解码过程。相比于一般的特征金字塔网络,我们以更低的成本构建了一个更高的特征金字塔。为了更好地嫁接两个跨模型的特征,我们基于注意力机制设计了跨模型嫁接模块以及配套的注意力引导损失进一步指导嫁接。

     3.UHRSD数据集

    对于监督学习而言,训练数据是非常重要的。在此之前,仅有的高分辨率SOD训练集只有1610张图片,经过我们的实验发现,仅在这个数据集上进行训练非常容易产生过拟合的现象,大大影响了模型的泛化能力。

    另一方面,如果混合着规模较大的低分辨率数据集,其低质量的边缘又会对训练数据引入新的噪声影响模型对于高分辨率场景的性能。因此我们提出了一个大规模的超高分辨率显著性检测数据集UHRSD(Ultra High-Resolution for Saliency Detection)。

    这个数据集包含共5920张超过4K分辨率(3840*2160)的图片。其中4932张图片和988张图片被分别划分为训练集和测试集。所有的图片都是从开源的图片网站手工挑选的。我们的数据集包含了各种各样丰富的场景,并且显著物体从简单到复杂都有涉及。在显著物体确认的过程中,多名参与者被要求确定同一场景中的显著物体,并经过投票统计得到每张图片中的显著物体。

    对比现有的HRSOD数据集,我们的UHRSD不仅在数据集规模上远远大于HRSOD,同时图片尺寸(对角线长度)更是远超HRSOD数据集中的图片尺寸。此外,为了体现高分辨率图片丰富的细节,显著物体边缘的长度可以从一定程度上反映物体的细节复杂程度,从直方图中可以看到,UHRSD中显著物体的边缘长度更是远远大于HRSOD中的。因此我们的UHRSD对于高分辨率SOD任务来说更具有挑战性和研究价值。

    ffa06dad8fd1fee9d7650550711a787f.png

    图2:左图 边缘像素数量直方图对比;右图 图片对角线长度直方图对比

    34ac26f175cbca0ce24fcf648551ef14.png

    图3:左图 UHRSD样例及标注;右图其他高分辨率数据集样例及标注

    除了数据集整体的基础属性优秀以外,UHRSD中图像的标注细致程度同样远超现有SOD数据集。左图为我们的UHRSD中的图片,右图为HRSOD数据集中图片。

    相比于低分辨率图片,高分辨率图片拥有的细节是分割过程中需要提升优化的重要部分。可以看到,相比于现有的高分辨率数据集粗略的标注,精细的标注不仅有助于高分辨率分割模型的训练更有利于测试阶段的对模型获取细节能力的准确评估。

    4c2fcf9dea220131fc36a9a2a2329169.png

    图4:左图 UHRSD样例及标注;右图 低分辨率数据集样例及标注

    最后,我们的数据集不仅在细节上标注更加清晰,相比于低分辨率数据集,我们的标注细粒度更高。左图为UHRSD图片,右图为DUTS-TR中图片。对于同样的显著物体自行车车轮,我们的标注达到了更高的精细标准,经过我们的实验发现,这样的标注偏好会对模型的训练产生影响,这对高分辨率模型的研究有着重要的意义。

     4.方法

    整体架构

    7cf3bd9f42c71d4b7661d574728d76d5.png

    图5:PGNet网络框架图

    我们提出了一种新颖高效的错层嫁接架构(PGNet)用于高分辨率图像的显著性检测。我们使用两个相对较低的特征金字塔和错层连接结构以低成本构成了一个更高的特征金字塔。

    一方面,Transformer编码器可以在低分辨率情况下获取准确的全局语义信息,在高分辨率输入下CNN编码器可以获取丰富的细节信息。另一方面,不同模型提取到的特征拥有更加丰富的特性,可以在显著性的识别过程中起到互补作用。

    在编码阶段,不同分辨率的图像被送入两个编码器中以并行地获取全局语义信息和丰富的细节信息。解码阶段可以被划分为三个小的子阶段,首先是Swin-Transformer特征的解码,随之是嫁接特征的解码,最后是ResNet提取到的特征的解码。

    其中嫁接的特征是由跨模型嫁接模块(CMGM)产生的,在这个模块中全局语义信息从Swin分支被嫁接到ResNet分支。这个跨模型嫁接模块产生一个名叫交叉注意力的矩阵被显式进行监督。整个网络结构以一种低计算成本实现了更加深的采样深度使其可以应对高分辨率输入带来的挑战。


    跨模型嫁接模块

    f00ccf69bec8e833335f074d0ef9e89a.png

    图6:CMGM模块结构

    我们提出跨模型嫁接模块(Cross Model Grafting Module,CMGM)来对不同骨干网络提取到的特征进行融合。与简单的特征融合方式相比,跨模型嫁接模块通过注意力机制,可以利用Transformer提取到的特征中的全局语义指导组合ResNet提取到的丰富细节特征。

    具体而言,跨模型嫁接模块将ResNet提取到的特征c39cc5554d9210c0e3a7c489a8e2e0cc.png展平为fac5eb06b430ec7c26443b476187a09b.png18941db367c50b37390cbec0c6c7e77a.png,对于Swin Transformer提取到的特征d8f36c7cc9424263d5f38ffb7b2a9246.png同样。受到多头注意力机制的启发,我们将层归一化和线性映射得到新的三个特征1393fea621db0aa9daed817793d4e829.png。通过矩阵乘法得到63fc3f90556fcd594f8d8d9a7e4720d3.png。如下公式所示:

    81e46625d018970dc2d3e11344f84487.png

    然后我们将c7d5a656dadc249fce7fe107eb34d9a5.png进行线性映射并重新恢复成e94571871f9ddc837847726bcbd92a7f.png后再通过卷积层。经过两个短路连接如图所示。除了产生嫁接特征以外,CMGM还将产生一个交叉注意力矩阵(Cross Attention Matrix, CAM),其生成过程可以表示为:

    0f047eb64890948e1b9f9b6afcd5c764.png

    注意力引导损失

    为了更好地将Transformer特征的全局语义信息嫁接到ResNet分支,我们设计了注意力引导损失(Attention Guided Loss, AGL)来辅助这一过程。我们认为CMGM产生的交叉注意力矩阵应该和真值标签产生的注意力矩阵相似,因为显著的特征应该有更高的相似度即在交叉注意力矩阵中更高的激活值。

    8083a7eaa7ec78bb45b7fbb4d34c8de1.png

    图7:AGL标签构造方法

    具体而言,我们需要首先构建注意力矩阵的标签。对于一个尺寸为50a179b59dcf621e03d185de44fa8575.png的显著映射836d4c47a832dd7b29b743875a66395c.png,将其展平为尺寸为3fa963c4c5a74c9f84516dd7f0bd1c9f.png872f0646310dd450ddc8b9d43789cf84.png,然后对其自身应用矩阵乘法得到对应的注意力矩阵7e14f4439611f0f1184485aa1ea6f913.png。这一过程可以记为:

    c9dbce0c2fd1d8694ee6d525a7891eab.png

    其中97e8acf2f5591dd5dc90ca227f3ca672.png表示72512440f69d5b0b394cc402f428056f.png中坐标为(x,y)的值,5eb0da994dfed482067434739433161e.png表示7ce709caf1d66a106dd98be50bfc5ce4.png的转置矩阵中坐标为(x,1)的值,d19e3991d17b3815a76df3798fb43a1f.png表示d311805c88bdfe41f4bf1d2923e7e292.png中坐标为(1,y)的值。利用变换384d5b822a8ba6b8c322dfb936b036d5.png得到96a796a730fd6f0649a94cbd63c911dc.png其中为真值映射,d661c73932cc065bc3b25de019a8c735.png分别为特征81b806eb91d540c84726501b80331bd7.png生成的显著预测中间结果。我们在加权二元交叉熵损失的基础上构建注意力引导损失来监督CMGM产生的CAM。因此5f6bbcc766756b256efe6dbecd1feb8e.png可以写作:

    f7054afd95c5eb870e453fa53936d6f0.png

    其中27db011eefbae4ee7b0b057767745a43.png为超参数来调整042d80676892ba4bcd816a4be6a0e785.png的权重。在上述公式中,9dd4641a6cebb99ca13a55ad3946c15f.png在每个像素上的损失被328bfb723b0979a7beec251ab384e5d9.png控制。使用权重的主要目的是(1)矩阵乘法操作使正样本和负样本的不均衡度被平方倍扩大,因此通过加权可以更好的解决正负样本不均衡的问题。(2)加权可以使网络更关注那些两个特征共同的错误。

    最终的损失可以写为:

    d54a9f1933fc9251dab49cacbb73f8a4.png

     5.实验结果

    6a7f2f77ba18ff39e69f9978e6756dde.png

    我们用多种训练集进行了训练,提供了包括DUTS-TR,DUTS-TR+HROSD,UHRSD+HRSOD的测试结果。在DUTS-TR和在混合数据集DUTS-TR+HRSOD的数据集上训练的性能都远超现有的方法。

    当时用混合数据集DUTS-TR+HROSD时,我们的方法显著的在高分辨率测试集上得到了提升。这也许是由于高分辨率与低分辨率数据集不同的分布特点导致的。这个结论被在UHRSD+HRSOD混合数据集上训练的结果进一步证实:在这个混合数据集上训练的模型在高分辨率测试集上的结果提升巨大。这也证明了高分辨率训练集对于高分辨率SOD模型在监督训练上的重要性。

    94b9c977c929d8e6ea9c5fad6f81ae11.png

    为了展示高分辨率图像的特性以及我们方法在处理高分辨率图像的优越性,我们提供了不同SOD方法的可视化结果。我们的方法可以优秀的捕捉细节并且产生清晰的边界(第1、2行)。

    除了高质量的边界,另一个高分辨率SOD的重要特点是分割出在小分辨率下容易被忽视的显著物体的小的结构(第3、5、6行),这同样也证明了我们方法单阶段的优越性。此外,我们的方法在例如第4行的极度复杂的情况下依旧可以很好地工作。

     6.总结

    我们提出了一种新颖高效的错层嫁接架构(PGNet)用于高分辨率图像的显著性检测,包含了跨分辨率的连接架构,以及基于注意力机制的嫁接模块和相应监督损失函数。值得注意的是,我们提供了首个4K分辨率的大规模SOD数据集,希望为未来高分辨率SOD的研究做出贡献。

    f986e54d11170b169305f0a3f77b1d8a.png

    END

    欢迎加入「显著目标检测交流群👇备注:OD

    7824220d77c9b4ec3fa7787c8a496241.png

    展开全文
  • 人工智能-目标检测-基于高阶能量项和学习关联模型的显著目标检测.pdf
  • 人工智能-目标检测-基于小波超复数分数阶傅里叶变换的视觉显著目标检测研究.pdf
  • 显著目标检测是计算机视觉的重要组成部分,目的是检测图像中最吸引人眼的目标区域。针对显著检测中特征的适应性不足以及当前一些算法出现多检与漏检的问题,提出从“目标在哪儿”与“背景在哪儿”两个角度描述显著性...
  • BASNET:边界感知的显著目标检测

    千次阅读 2020-06-08 18:44:07
    BASNET:边界感知的显著目标检测 摘要 采用深卷积神经网络进行显著目标检测,取得了较好的效果。然而,以前的工作大多侧重于区域精度,而不是边界质量。在本文中,我们提出了一种预测-细化体系结构Basnet和一种新的...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 59,307
精华内容 23,722
关键字:

显著目标检测

友情链接: mei_cl_bus.rar