计算机视觉的经典论文

2019-08-24 13:25:09 ctrigger 阅读数 1624

经典论文

计算机视觉论文

  1. ImageNet分类
  2. 物体检测
  3. 物体跟踪
  4. 低级视觉
  5. 边缘检测
  6. 语义分割
  7. 视觉注意力和显著性
  8. 物体识别
  9. 人体姿态估计
  10. CNN原理和性质(Understanding CNN)
  11. 图像和语言
  12. 图像解说
  13. 视频解说
  14. 图像生成

微软ResNet

论文:用于图像识别的深度残差网络

作者:何恺明、张祥雨、任少卿和孙剑

链接:http://arxiv.org/pdf/1512.03385v1.pdf

微软PRelu(随机纠正线性单元/权重初始化)

论文:深入学习整流器:在ImageNet分类上超越人类水平

作者:何恺明、张祥雨、任少卿和孙剑

链接:http://arxiv.org/pdf/1502.01852.pdf

谷歌Batch Normalization

论文:批量归一化:通过减少内部协变量来加速深度网络训练

作者:Sergey Ioffe, Christian Szegedy

链接:http://arxiv.org/pdf/1502.03167.pdf

谷歌GoogLeNet

论文:更深的卷积,CVPR 2015

作者:Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

链接:http://arxiv.org/pdf/1409.4842.pdf

牛津VGG-Net

论文:大规模视觉识别中的极深卷积网络,ICLR 2015

作者:Karen Simonyan & Andrew Zisserman

链接:http://arxiv.org/pdf/1409.1556.pdf

AlexNet

论文:使用深度卷积神经网络进行ImageNet分类

作者:Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

链接:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

物体检测

这里写图片描述

PVANET

论文:用于实时物体检测的深度轻量神经网络(PVANET:Deep but Lightweight Neural Networks for Real-time Object Detection)

作者:Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park

链接:http://arxiv.org/pdf/1608.08021

纽约大学OverFeat

论文:使用卷积网络进行识别、定位和检测(OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks),ICLR 2014

作者:Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun

链接:http://arxiv.org/pdf/1312.6229.pdf

伯克利R-CNN

论文:精确物体检测和语义分割的丰富特征层次结构(Rich feature hierarchies for accurate object detection and semantic segmentation),CVPR 2014

作者:Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

链接:http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

微软SPP

论文:视觉识别深度卷积网络中的空间金字塔池化(Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition),ECCV 2014

作者:何恺明、张祥雨、任少卿和孙剑

链接:http://arxiv.org/pdf/1406.4729.pdf

微软Fast R-CNN

论文:Fast R-CNN

作者:Ross Girshick

链接:http://arxiv.org/pdf/1504.08083.pdf

微软Faster R-CNN

论文:使用RPN走向实时物体检测(Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks)

作者:任少卿、何恺明、Ross Girshick、孙剑

链接:http://arxiv.org/pdf/1506.01497.pdf

牛津大学R-CNN minus R

论文:R-CNN minus R

作者:Karel Lenc, Andrea Vedaldi

链接:http://arxiv.org/pdf/1506.06981.pdf

端到端行人检测

论文:密集场景中端到端的行人检测(End-to-end People Detection in Crowded Scenes)

作者:Russell Stewart, Mykhaylo Andriluka

链接:http://arxiv.org/pdf/1506.04878.pdf

实时物体检测

论文:你只看一次:统一实时物体检测(You Only Look Once: Unified, Real-Time Object Detection)

作者:Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

链接:http://arxiv.org/pdf/1506.02640.pdf

Inside-Outside Net

论文:使用跳跃池化和RNN在场景中检测物体(Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks)

作者:Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

链接:http://arxiv.org/abs/1512.04143.pdf

微软ResNet

论文:用于图像识别的深度残差网络

作者:何恺明、张祥雨、任少卿和孙剑

链接:http://arxiv.org/pdf/1512.03385v1.pdf

R-FCN

论文:通过区域全卷积网络进行物体识别(R-FCN: Object Detection via Region-based Fully Convolutional Networks)

作者:代季峰,李益,何恺明,孙剑

链接:http://arxiv.org/abs/1605.06409

SSD

论文:单次多框检测器(SSD: Single Shot MultiBox Detector)

作者:Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

链接:http://arxiv.org/pdf/1512.02325v2.pdf

速度/精度权衡

论文:现代卷积物体检测器的速度/精度权衡(Speed/accuracy trade-offs for modern convolutional object detectors)

作者:Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

链接:http://arxiv.org/pdf/1611.10012v1.pdf

物体跟踪

  • 论文:用卷积神经网络通过学习可区分的显著性地图实现在线跟踪(Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network)

作者:Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han

地址:arXiv:1502.06796.

  • 论文:DeepTrack:通过视觉跟踪的卷积神经网络学习辨别特征表征(DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking)

作者:Hanxi Li, Yi Li and Fatih Porikli

发表: BMVC, 2014.

  • 论文:视觉跟踪中,学习深度紧凑图像表示(Learning a Deep Compact Image Representation for Visual Tracking)

作者:N Wang, DY Yeung

发表:NIPS, 2013.

  • 论文:视觉跟踪的分层卷积特征(Hierarchical Convolutional Features for Visual Tracking)

作者:Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang

发表: ICCV 2015

  • 论文:完全卷积网络的视觉跟踪(Visual Tracking with fully Convolutional Networks)

作者:Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu,

发表:ICCV 2015

  • 论文:学习多域卷积神经网络进行视觉跟踪(Learning Multi-Domain Convolutional Neural Networks for Visual Tracking)

作者:Hyeonseob Namand Bohyung Han

对象识别(Object Recognition)

论文:卷积神经网络弱监督学习(Weakly-supervised learning with convolutional neural networks)

作者:Maxime Oquab,Leon Bottou,Ivan Laptev,Josef Sivic,CVPR,2015

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Oquab_Is_Object_Localization_2015_CVPR_paper.pdf

FV-CNN

论文:深度滤波器组用于纹理识别和分割(Deep Filter Banks for Texture Recognition and Segmentation)

作者:Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi, CVPR, 2015.

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Cimpoi_Deep_Filter_Banks_2015_CVPR_paper.pdf

人体姿态估计(Human Pose Estimation)

  • 论文:使用 Part Affinity Field的实时多人2D姿态估计(Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields)

作者:Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, CVPR, 2017.

  • 论文:Deepcut:多人姿态估计的联合子集分割和标签(Deepcut: Joint subset partition and labeling for multi person pose estimation)

作者:Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele, CVPR, 2016.

  • 论文:Convolutional pose machines

作者:Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh, CVPR, 2016.

  • 论文:人体姿态估计的 Stacked hourglass networks(Stacked hourglass networks for human pose estimation)

作者:Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV, 2016.

  • 论文:用于视频中人体姿态估计的Flowing convnets(Flowing convnets for human pose estimation in videos)

作者:Tomas Pfister, James Charles, and Andrew Zisserman, ICCV, 2015.

  • 论文:卷积网络和人类姿态估计图模型的联合训练(Joint training of a convolutional network and a graphical model for human pose estimation)

作者:Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler, NIPS, 2014.

理解CNN

这里写图片描述

  • 论文:通过测量同变性和等价性来理解图像表示(Understanding image representations by measuring their equivariance and equivalence)

作者:Karel Lenc, Andrea Vedaldi, CVPR, 2015.

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf

  • 论文:深度神经网络容易被愚弄:无法识别的图像的高置信度预测(Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images)

作者:Anh Nguyen, Jason Yosinski, Jeff Clune, CVPR, 2015.

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf

  • 论文:通过反演理解深度图像表示(Understanding Deep Image Representations by Inverting Them)

作者:Aravindh Mahendran, Andrea Vedaldi, CVPR, 2015

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf

  • 论文:深度场景CNN中的对象检测器(Object Detectors Emerge in Deep Scene CNNs)

作者:Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, ICLR, 2015.

链接:http://arxiv.org/abs/1412.6856

  • 论文:用卷积网络反演视觉表示(Inverting Visual Representations with Convolutional Networks)

作者:Alexey Dosovitskiy, Thomas Brox, arXiv, 2015.

链接:http://arxiv.org/abs/1506.02753

  • 论文:可视化和理解卷积网络(Visualizing and Understanding Convolutional Networks)

作者:Matthrew Zeiler, Rob Fergus, ECCV, 2014.

链接:http://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf

图像与语言

图像说明(Image Captioning)

这里写图片描述

UCLA / Baidu

用多模型循环神经网络解释图像(Explain Images with Multimodal Recurrent Neural Networks)

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille, arXiv:1410.1090

http://arxiv.org/pdf/1410.1090

Toronto

使用多模型神经语言模型统一视觉语义嵌入(Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models)

Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel, arXiv:1411.2539.

http://arxiv.org/pdf/1411.2539

Berkeley

用于视觉识别和描述的长期循环卷积网络(Long-term Recurrent Convolutional Networks for Visual Recognition and Description)

Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, arXiv:1411.4389.

http://arxiv.org/pdf/1411.4389

Google

看图写字:神经图像说明生成器(Show and Tell: A Neural Image Caption Generator)

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, arXiv:1411.4555.

http://arxiv.org/pdf/1411.4555

Stanford

用于生成图像描述的深度视觉语义对齐(Deep Visual-Semantic Alignments for Generating Image Description)

Andrej Karpathy, Li Fei-Fei, CVPR, 2015.

Web:http://cs.stanford.edu/people/karpathy/deepimagesent/

Paper:http://cs.stanford.edu/people/karpathy/cvpr2015.pdf

UML / UT

使用深度循环神经网络将视频转换为自然语言(Translating Videos to Natural Language Using Deep Recurrent Neural Networks)

Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, NAACL-HLT, 2015.

http://arxiv.org/pdf/1412.4729

CMU / Microsoft

学习图像说明生成的循环视觉表示(Learning a Recurrent Visual Representation for Image Caption Generation)

Xinlei Chen, C. Lawrence Zitnick, arXiv:1411.5654.

Xinlei Chen, C. Lawrence Zitnick, Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation, CVPR 2015

http://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf

Microsoft

从图像说明到视觉概念(From Captions to Visual Concepts and Back)

Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, CVPR, 2015.

http://arxiv.org/pdf/1411.4952

Univ. Montreal / Univ. Toronto

Show, Attend, and Tell:视觉注意力与神经图像标题生成(Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention)

Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, arXiv:1502.03044 / ICML 2015

http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf

Idiap / EPFL / Facebook

基于短语的图像说明(Phrase-based Image Captioning)

Remi Lebret, Pedro O. Pinheiro, Ronan Collobert, arXiv:1502.03671 / ICML 2015

http://arxiv.org/pdf/1502.03671

UCLA / Baidu

像孩子一样学习:从图像句子描述快速学习视觉的新概念(Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images)

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, arXiv:1504.06692

http://arxiv.org/pdf/1504.06692

MS + Berkeley

探索图像说明的最近邻方法( Exploring Nearest Neighbor Approaches for Image Captioning)

Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick, arXiv:1505.04467

http://arxiv.org/pdf/1505.04467.pdf

图像说明的语言模型(Language Models for Image Captioning: The Quirks and What Works)

Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell, arXiv:1505.01809

http://arxiv.org/pdf/1505.01809.pdf

阿德莱德

具有中间属性层的图像说明( Image Captioning with an Intermediate Attributes Layer)

Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick, arXiv:1506.01144

蒂尔堡

通过图片学习语言(Learning language through pictures)

Grzegorz Chrupala, Akos Kadar, Afra Alishahi, arXiv:1506.03694

蒙特利尔大学

使用基于注意力的编码器-解码器网络描述多媒体内容(Describing Multimedia Content using Attention-based Encoder-Decoder Networks)

Kyunghyun Cho, Aaron Courville, Yoshua Bengio, arXiv:1507.01053

康奈尔

图像表示和神经图像说明的新领域(Image Representations and New Domains in Neural Image Captioning)

Jack Hessel, Nicolas Savva, Michael J. Wilber, arXiv:1508.02091

MS + City Univ. of HongKong

Learning Query and Image Similarities with Ranking Canonical Correlation Analysis

Ting Yao, Tao Mei, and Chong-Wah Ngo, ICCV, 2015

视频字幕(Video Captioning)

伯克利

Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015.

犹他州/ UML / 伯克利

Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729.

微软

Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861.

犹他州/ UML / 伯克利

Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence–Video to Text, arXiv:1505.00487.

蒙特利尔大学/ 舍布鲁克

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029

MPI / 伯克利

Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698

多伦多大学 / MIT

Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724

蒙特利尔大学

Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053

TAU / 美国南加州大学

Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950.

图像生成

卷积/循环网络

  • 论文:Conditional Image Generation with PixelCNN Decoders”

作者:Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

  • 论文:Learning to Generate Chairs with Convolutional Neural Networks

作者:Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox

发表:CVPR, 2015.

  • 论文:DRAW: A Recurrent Neural Network For Image Generation

作者:Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra

发表:ICML, 2015.

对抗网络

  • 论文:生成对抗网络(Generative Adversarial Networks)

作者:Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

发表:NIPS, 2014.

  • 论文:使用对抗网络Laplacian Pyramid 的深度生成图像模型(Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks)

作者:Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

发表:NIPS, 2015.

  • 论文:生成模型演讲概述 (A note on the evaluation of generative models)

作者:Lucas Theis, Aäron van den Oord, Matthias Bethge

发表:ICLR 2016.

  • 论文:变分自动编码深度高斯过程(Variationally Auto-Encoded Deep Gaussian Processes)

作者:Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence

发表:ICLR 2016.

  • 论文:用注意力机制从字幕生成图像 (Generating Images from Captions with Attention)

作者:Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov

发表: ICLR 2016

  • 论文:分类生成对抗网络的无监督和半监督学习(Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks)

作者:Jost Tobias Springenberg

发表:ICLR 2016

  • 论文:用一个对抗检测表征(Censoring Representations with an Adversary)

作者:Harrison Edwards, Amos Storkey

发表:ICLR 2016

  • 论文:虚拟对抗训练实现分布式顺滑 (Distributional Smoothing with Virtual Adversarial Training)

作者:Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii

发表:ICLR 2016

  • 论文:自然图像流形上的生成视觉操作(Generative Visual Manipulation on the Natural Image Manifold)

作者:朱俊彦, Philipp Krahenbuhl, Eli Shechtman, and Alexei A. Efros

发表: ECCV 2016.

  • 论文:深度卷积生成对抗网络的无监督表示学习(Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks)

作者:Alec Radford, Luke Metz, Soumith Chintala

发表: ICLR 2016

问题回答

这里写图片描述

弗吉尼亚大学 / 微软研究院

论文:VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.

作者:Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh

MPI / 伯克利

论文:Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

作者:Mateusz Malinowski, Marcus Rohrbach, Mario Fritz,

发布 : arXiv:1505.01121.

多伦多

论文: Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

作者:Mengye Ren, Ryan Kiros, Richard Zemel

发表: arXiv:1505.02074 / ICML 2015 deep learning workshop.

百度/ 加州大学洛杉矶分校

作者:Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, 徐伟

论文:Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

发表: arXiv:1505.05612.

POSTECH(韩国)

论文:Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

作者:Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han

发表: arXiv:1511.05765

CMU / 微软研究院

论文:Stacked Attention Networks for Image Question Answering

作者:Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2015)

发表: arXiv:1511.02274.

MetaMind

论文:Dynamic Memory Networks for Visual and Textual Question Answering

作者:Xiong, Caiming, Stephen Merity, and Richard Socher

发表: arXiv:1603.01417 (2016).

首尔国立大学 + NAVER

论文:Multimodal Residual Learning for Visual QA

作者:Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

发表:arXiv:1606:01455

UC Berkeley + 索尼

论文:Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

作者:Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach

发表:arXiv:1606.01847

Postech

论文:Training Recurrent Answering Units with Joint Loss Minimization for VQA

作者:Hyeonwoo Noh and Bohyung Han

发表: arXiv:1606.03647

首尔国立大学 + NAVER

论文: Hadamard Product for Low-rank Bilinear Pooling

作者:Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhan

发表:arXiv:1610.04325.

视觉注意力和显著性

这里写图片描述
论文:Predicting Eye Fixations using Convolutional Neural Networks

作者:Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu

发表:CVPR, 2015.

学习地标的连续搜索

作者:Learning a Sequential Search for Landmarks

论文:Saurabh Singh, Derek Hoiem, David Forsyth

发表:CVPR, 2015.

视觉注意力机制实现多物体识别

论文:Multiple Object Recognition with Visual Attention

作者:Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu,

发表:ICLR, 2015.

视觉注意力机制的循环模型

作者:Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

论文:Recurrent Models of Visual Attention

发表:NIPS, 2014.

低级视觉

超分辨率

  • Iterative Image Reconstruction

Sven Behnke: Learning Iterative Image Reconstruction. IJCAI, 2001.

Sven Behnke: Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid. International Journal of Computational Intelligence and Applications, vol. 1, no. 4, pp. 427-438, 2001.

  • Super-Resolution (SRCNN)

Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution, ECCV, 2014.

Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Image Super-Resolution Using Deep Convolutional Networks, arXiv:1501.00092.

  • Very Deep Super-Resolution

Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Accurate Image Super-Resolution Using Very Deep Convolutional Networks, arXiv:1511.04587, 2015.

  • Deeply-Recursive Convolutional Network

Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Deeply-Recursive Convolutional Network for Image Super-Resolution, arXiv:1511.04491, 2015.

  • Casade-Sparse-Coding-Network

Zhaowen Wang, Ding Liu, Wei Han, Jianchao Yang and Thomas S. Huang, Deep Networks for Image Super-Resolution with Sparse Prior. ICCV, 2015.

  • Perceptual Losses for Super-Resolution

Justin Johnson, Alexandre Alahi, Li Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, arXiv:1603.08155, 2016.

  • SRGAN

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, arXiv:1609.04802v3, 2016.

其他应用

Optical Flow (FlowNet)

Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox, FlowNet: Learning Optical Flow with Convolutional Networks, arXiv:1504.06852.

Compression Artifacts Reduction

Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, arXiv:1504.06993.

Blur Removal

Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Schölkopf, Learning to Deblur, arXiv:1406.7444

Jian Sun, Wenfei Cao, Zongben Xu, Jean Ponce, Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal, CVPR, 2015

Image Deconvolution

Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia, Deep Convolutional Neural Network for Image Deconvolution, NIPS, 2014.

Deep Edge-Aware Filter

Li Xu, Jimmy SJ. Ren, Qiong Yan, Renjie Liao, Jiaya Jia, Deep Edge-Aware Filters, ICML, 2015.

Computing the Stereo Matching Cost with a Convolutional Neural Network

Jure Žbontar, Yann LeCun, Computing the Stereo Matching Cost with a Convolutional Neural Network, CVPR, 2015.

Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV, 2016

Feature Learning by Inpainting

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, Context Encoders: Feature Learning by Inpainting, CVPR, 2016

边缘检测

这里写图片描述
Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection, arXiv:1504.06375.

DeepEdge

Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015.

DeepContour

Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang, DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection, CVPR, 2015.

语义分割

这里写图片描述

SEC: Seed, Expand and Constrain

Alexander Kolesnikov, Christoph Lampert, Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation, ECCV, 2016.

Adelaide

Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient piecewise training of deep structured models for semantic segmentation, arXiv:1504.01013. (1st ranked in VOC2012)

Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel, Deeply Learning the Messages in Message Passing Inference, arXiv:1508.02108. (4th ranked in VOC2012)

Deep Parsing Network (DPN)

Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang, Semantic Image Segmentation via Deep Parsing Network, arXiv:1509.02634 / ICCV 2015 (2nd ranked in VOC 2012)

CentraleSuperBoundaries, INRIA

Iasonas Kokkinos, Surpassing Humans in Boundary Detection using Deep Learning, arXiv:1411.07386 (4th ranked in VOC 2012)

BoxSup

Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640. (6th ranked in VOC2012)

POSTECH

Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation, arXiv:1505.04366. (7th ranked in VOC2012)

Seunghoon Hong, Hyeonwoo Noh, Bohyung Han, Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation, arXiv:1506.04924.

Seunghoon Hong,Junhyuk Oh,Bohyung Han, andHonglak Lee, Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, arXiv:1512.07928

Conditional Random Fields as Recurrent Neural Networks

Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr, Conditional Random Fields as Recurrent Neural Networks, arXiv:1502.03240. (8th ranked in VOC2012)

DeepLab

Liang-Chieh Chen, George Papandreou, Kevin Murphy, Alan L. Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image segmentation, arXiv:1502.02734. (9th ranked in VOC2012)

Zoom-out

Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015

Joint Calibration

Holger Caesar, Jasper Uijlings, Vittorio Ferrari, Joint Calibration for Semantic Segmentation, arXiv:1507.01581.

Fully Convolutional Networks for Semantic Segmentation

Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.

Hypercolumn

Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for Object Segmentation and Fine-Grained Localization, CVPR, 2015.

Deep Hierarchical Parsing

Abhishek Sharma, Oncel Tuzel, David W. Jacobs, Deep Hierarchical Parsing for Semantic Segmentation, CVPR, 2015.

Learning Hierarchical Features for Scene Labeling

Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, ICML, 2012.

Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Learning Hierarchical Features for Scene Labeling, PAMI, 2013.

University of Cambridge

Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” arXiv preprint arXiv:1511.00561, 2015.

Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla “Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding.” arXiv preprint arXiv:1511.02680, 2015.

Princeton

Fisher Yu, Vladlen Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions”, ICLR 2016

Univ. of Washington, Allen AI

Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi, “Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing”, ICCV, 2015

INRIA

Iasonas Kokkinos, “Pusing the Boundaries of Boundary Detection Using deep Learning”, ICLR 2016

UCSB

Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, “Weakly supervised graph based semantic segmentation by learning communities of image-parts”, ICCV, 2015

其他资源

课程

深度视觉

[斯坦福] CS231n: Convolutional Neural Networks for Visual Recognition

[香港中文大学] ELEG 5040: Advanced Topics in Signal Processing(Introduction to Deep Learning)

· 更多深度课程推荐

[斯坦福] CS224d: Deep Learning for Natural Language Processing

[牛津 Deep Learning by Prof. Nando de Freitas

[纽约大学] Deep Learning by Prof. Yann LeCun

图书

免费在线图书

Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Neural Networks and Deep Learning by Michael Nielsen

Deep Learning Tutorial by LISA lab, University of Montreal

视频

演讲

Deep Learning, Self-Taught Learning and Unsupervised Feature Learning By Andrew Ng

Recent Developments in Deep Learning By Geoff Hinton

The Unreasonable Effectiveness of Deep Learning by Yann LeCun

Deep Learning of Representations by Yoshua bengio

软件

框架

  • Tensorflow: An open source software library for numerical computation using data flow graph by Google [Web]
  • Torch7: Deep learning library in Lua, used by Facebook and Google Deepmind [Web]
  • Torch-based deep learning libraries: [torchnet],
  • Caffe: Deep learning framework by the BVLC [Web]
  • Theano: Mathematical library in Python, maintained by LISA lab [Web]
  • Theano-based deep learning libraries: [Pylearn2], [Blocks], [Keras], [Lasagne]
  • MatConvNet: CNNs for MATLAB [Web]
  • MXNet: A flexible and efficient deep learning library for heterogeneous distributed systems with multi-language support [Web]
  • Deepgaze: A computer vision library for human-computer interaction based on CNNs [Web]

应用

  • 对抗训练 Code and hyperparameters for the paper “Generative Adversarial Networks” [Web]
  • 理解与可视化 Source code for “Understanding Deep Image Representations by Inverting Them,” CVPR, 2015. [Web]
  • 词义分割 Source code for the paper “Rich feature hierarchies for accurate object detection and semantic segmentation,” CVPR, 2014. [Web] ; Source code for the paper “Fully Convolutional Networks for Semantic Segmentation,” CVPR, 2015. [Web]
  • 超分辨率 Image Super-Resolution for Anime-Style-Art [Web]
  • 边缘检测 Source code for the paper “DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection,” CVPR, 2015. [Web]
  • Source code for the paper “Holistically-Nested Edge Detection”, ICCV 2015. [Web]

讲座

  • [CVPR 2014] Tutorial on Deep Learning in Computer Vision
  • [CVPR 2015] Applied Deep Learning for Computer Vision with Torch

博客

  • Deep down the rabbit hole: CVPR 2015 and beyond@Tombone’s Computer Vision Blog
  • CVPR recap and where we’re going@Zoya Bylinskii (MIT PhD Student)’s Blog
  • Facebook’s AI Painting@Wired
  • Inceptionism: Going Deeper into Neural Networks@Google Research
  • Implementing Neural networks
2019-01-28 19:02:54 Extremevision 阅读数 3813

作者:朱政
原文:CV arXiv Daily:计算机视觉论文每日精选(2019/1/23-2018/1/28)
如有兴趣可以**点击加入极市CV专业微信群**,获取更多高质量干货

本系列文章转自计算机视觉牛人朱政大佬的微信公众号(CV arxiv Daily),已经授权转载,主要是为了帮大家筛选计算机视觉领域每天的arXiv中的论文,主要关注领域:目标检测,图像分割,单/多目标跟踪,行为识别,人体姿态估计与跟踪,行人重识别,GAN,模型搜索等。欢迎关注我,每日会定时转发,努力学习起来~

2019/1/28

[1] Google的自监督表征学习文章
Revisiting Self-Supervised Visual Representation Learning
论文链接:https://arxiv.org/abs/1901.09005
代码地址:https://github.com/google/revisiting-self-supervised
摘要: Unsupervised visual representation learning remains a largely unsolved problem in computer vision research. Among a big body of recently proposed approaches for unsupervised learning of visual representations, a class of self-supervised techniques achieves superior performance on many challenging benchmarks. A large number of the pretext tasks for self-supervised learning have been studied, but other important aspects, such as the choice of convolutional neural networks (CNN), has not received equal attention. Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. We challenge a number of common practices in selfsupervised visual representation learning and observe that standard recipes for CNN design do not always translate to self-supervised representation learning. As part of our study, we drastically boost the performance of previously proposed techniques and outperform previously published state-of-the-art results by a large margin.


[2] ICLR 2019 GAN文章
Diversity-Sensitive Conditional Generative Adversarial Networks
论文链接:https://arxiv.org/abs/1901.09024
摘要: We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.


[3] 上交卢策吾老师的Q-learning for斗地主 文章
Combinational Q-Learning for Dou Di Zhu
论文链接:https://arxiv.org/abs/1901.08925
代码地址:https://github.com/qq456cvb/doudizhu-C
摘要: Deep reinforcement learning (DRL) has gained a lot of attention in recent years, and has been proven to be able to play Atari games and Go at or above human levels. However, those games are assumed to have a small fixed number of actions and could be trained with a simple CNN network. In this paper, we study a special class of Asian popular card games called Dou Di Zhu, in which two adversarial groups of agents must consider numerous card combinations at each time step, leading to huge number of actions. We propose a novel method to handle combinatorial actions, which we call combinational Q-learning (CQL). We employ a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions. Results show that our method prevails over state-of-the art methods like naive Q-learning and A3C. We develop an easy-to-use card game environments and train all agents adversarially from sractch, with only knowledge of game rules and verify that our agents are comparative to humans. Our code to reproduce all reported results will be available online.


[4] WACV2019 3D点云 文章
Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network
论文链接:https://arxiv.org/abs/1901.08906
摘要: Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for generating dense, high-resolution outputs. In this work, we introduce DensePCR, a deep pyramidal network for point cloud reconstruction that hierarchically predicts point clouds of increasing resolution. Towards this end, we propose an architecture that first predicts a low-resolution point cloud, and then hierarchically increases the resolution by aggregating local and global point features to deform a grid. Our method generates point clouds that are accurate, uniform and dense. Through extensive quantitative and qualitative evaluation on synthetic and real datasets, we demonstrate that DensePCR outperforms the existing state-of-the-art point cloud reconstruction works, while also providing a light-weight and scalable architecture for predicting high-resolution outputs.


[5] Multi-Target Multi-Camera Tracking 文章
Multiple Hypothesis Tracking Algorithm for Multi-Target Multi-Camera Tracking with Disjoint Views
论文链接:https://arxiv.org/abs/1901.08787
摘要: In this study, a multiple hypothesis tracking (MHT) algorithm for multi-target multi-camera tracking (MCT) with disjoint views is proposed. Our method forms track-hypothesis trees, and each branch of them represents a multi-camera track of a target that may move within a camera as well as move across cameras. Furthermore, multi-target tracking within a camera is performed simultaneously with the tree formation by manipulating a status of each track hypothesis. Each status represents three different stages of a multi-camera track: tracking, searching, and end-of-track. The tracking status means targets are tracked by a single camera tracker. In the searching status, the disappeared targets are examined if they reappear in other cameras. The end-of-track status does the target exited the camera network due to its lengthy invisibility. These three status assists MHT to form the track-hypothesis trees for multi-camera tracking. Furthermore, they present a gating technique for eliminating of unlikely observation-to-track association. In the experiments, they evaluate the proposed method using two datasets, DukeMTMC and NLPR-MCT, which demonstrates that the proposed method outperforms the state-of-the-art method in terms of improvement of the accuracy. In addition, they show that the proposed method can operate in real-time and online.


[6] One-Class CNN 文章
One-Class Convolutional Neural Network
论文链接:https://arxiv.org/abs/1901.08688
代码地址:github.com/otkupjnoz/oc-cnn
摘要: We present a novel Convolutional Neural Network (CNN) based approach for one class classification. The idea is to use a zero centered Gaussian noise in the latent space as the pseudo-negative class and train the network using the cross-entropy loss to learn a good representation as well as the decision boundary for the given class. A key feature of the proposed approach is that any pre-trained CNN can be used as the base network for one class classification. The proposed One Class CNN (OC-CNN) is evaluated on the UMDAA-02 Face, Abnormality-1001, FounderType-200 datasets. These datasets are related to a variety of one class application problems such as user authentication, abnormality detection and novelty detection. Extensive experiments demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art methods. The source code is available at : github.com/otkupjnoz/oc-cnn.


[7] In Defense of the Triplet Loss 文章
In Defense of the Triplet Loss for Visual Recognition
论文链接:https://arxiv.org/abs/1901.08616
摘要: We employ triplet loss as a space embedding regularizer to boost classification performance. Standard architectures, like ResNet and DesneNet, are extended to support both losses with minimal hyper-parameter tuning. This promotes generality while fine-tuning pretrained networks. Triplet loss is a powerful surrogate for recently proposed embedding regularizers. Yet, it is avoided for large batch-size requirement and high computational cost. Through our experiments, we re-assess these assumptions. During inference, our network supports both classification and embedding tasks without any computational overhead. Quantitative evaluation highlights how our approach compares favorably to the existing state of the art on multiple fine-grained recognition datasets. Further evaluation on an imbalanced video dataset achieves significant improvement (>7%). Beyond boosting efficiency, triplet loss brings retrieval and interpretability to classification models.

2019/1/26

SiamRPN系列文章总结

[0] SiamFC文章,对SINT(Siamese Instance Search for Tracking,in CVPR2016)改进,第一个提出用全卷积孪生网络结构来解决tracking问题的paper,可以视为只有一个anchor的SiamRPN
论文题目:Fully-convolutional siamese networks for object tracking
论文地址:https://arxiv.org/abs/1606.09549
项目地址:https://www.robots.ox.ac.uk/~luca/siamese-fc.html
tf实现:https://github.com/torrvision/siamfc-tf
pytorch实现:https://github.com/rafellerc/Pytorch-SiamFC


[0.1] 后面的v2版本即CFNet,用cf操作代替了correlation操作。
论文题目:End-To-End Representation Learning for Correlation Filter Based Tracking
论文地址:http://openaccess.thecvf.com/content_cvpr_2017/html/Valmadre_End-To-End_Representation_Learning_CVPR_2017_paper.html
项目地址:http://www.robots.ox.ac.uk/~luca/cfnet.html
MatConvNet实现:https://github.com/bertinetto/cfnet
SiamFC之后有诸多的改进工作,例如


[0.2] StructSiam,在跟踪中考虑Local structures
论文题目:Structured Siamese Network for Real-Time Visual Tracking
论文地址:http://openaccess.thecvf.com/content_ECCV_2018/papers/Yunhua_Zhang_Structured_Siamese_Network_ECCV_2018_paper.pdf


[0.3] SiamFC-tri,在Saimese跟踪网络中引入了Triplet Loss
论文题目:Triplet Loss in Siamese Network for Object Tracking
论文地址:http://openaccess.thecvf.com/content_ECCV_2018/papers/Xingping_Dong_Triplet_Loss_with_ECCV_2018_paper.pdf


[0.4] DSiam,动态Siamese网络
论文题目:Learning Dynamic Siamese Network for Visual Object Tracking
论文地址:http://openaccess.thecvf.com/content_ICCV_2017/papers/Guo_Learning_Dynamic_Siamese_ICCV_2017_paper.pdf
代码地址:https://github.com/tsingqguo/DSiam


[0.5] SA-Siam,Twofold Siamese网络
论文题目:A Twofold Siamese Network for Real-Time Object Tracking
论文地址:http://openaccess.thecvf.com/content_cvpr_2018/papers/He_A_Twofold_Siamese_CVPR_2018_paper.pdf


[1] SiamRPN文章,将anchor应用在候选区域的每个位置,同时进行分类和回归,one-shot local detection。
论文题目:High Performance Visual Tracking with Siamese Region Proposal Network
论文地址:http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf
项目地址:http://bo-li.info/SiamRPN/


[2] DaSiamRPN, SiamRPN文章的follow-up,重点强调了训练过程中样本不均衡的问题,增加了正样本的种类和有语义的负样本。
论文题目:Distractor-aware Siamese Networks for Visual Object Tracking
论文地址:https://arxiv.org/abs/1808.06048
项目地址:http://bo-li.info/DaSiamRPN/
test code:https://github.com/foolwood/DaSiamRPN


[3] Cascaded SiamRPN,将若干RPN模块cascade起来,同时利用了不同layer的feature。
论文题目:Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking
论文地址:https://arxiv.org/abs/1812.06148


[4] SiamMask,在SiamRPN的结构中增加了一个mask分支,同时进行tracking和video segmentation。
论文题目:Fast Online Object Tracking and Segmentation: A Unifying Approach
论文地址:https://arxiv.org/abs/1812.05050
项目地址:http://www.robots.ox.ac.uk/~qwang/SiamMask/


[5] SiamRPN++, SiamRPN文章的follow-up,让现代网络例如ResNet在tracking中work了,基本上所有数据集都是SOTA。
论文题目:SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
论文地址:https://arxiv.org/abs/1812.11703
项目地址:http://bo-li.info/SiamRPN++/


[6] Deeper and Wider SiamRPN,将网络加深加宽来提升性能,重点关注感受野和padding的影响。
论文题目:Deeper and Wider Siamese Networks for Real-Time Visual Tracking
论文地址:https://arxiv.org/abs/1901.01660
test code:https://gitlab.com/MSRA_NLPR/deeper_wider_siamese_trackers

2019/1/25


[1] Salient Object Detection文章
Deep Reasoning with Multi-scale Context for Salient Object Detection
论文链接:https://arxiv.org/abs/1901.08362


[2] 交通场景异常检测综述
Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey
论文链接:https://arxiv.org/abs/1901.08292


[3] 3D目标检测
3D Backbone Network for 3D Object Detection
论文链接:https://arxiv.org/abs/1901.08373


[4] 语义分割文章
Application of Decision Rules for Handling Class Imbalance in Semantic Segmentation
论文链接:https://arxiv.org/abs/1901.08394


[5] 目标检测文章
Object Detection based on Region Decomposition and Assembly
论文链接:https://arxiv.org/abs/1901.08225


[6] 牛津的图卷积网络文章
Hypergraph Convolution and Hypergraph Attention
论文链接:https://arxiv.org/abs/1901.08150

2019/1/24

[1] 京东PoseTrack2018亚军方案的技术报告
A Top-down Approach to Articulated Human Pose Estimation and Tracking
论文链接:https://arxiv.org/abs/1901.07680


[2] 投稿TNNLS网络压缩文章
Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
论文链接:https://arxiv.org/abs/1901.07827
代码:https://github.com/ShaohuiLin/SSR


[3] 港中文&商汤 DeepFashion数据集
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
论文链接:https://arxiv.org/abs/1901.07973
代码:https://github.com/switchablenorms/DeepFashion2

[4]目标检测文章
Bottom-up Object Detection by Grouping Extreme and Center Points
论文链接:https://arxiv.org/abs/1901.08043
代码:https://github.com/xingyizhou/ExtremeNet

2019/1/23

[1] 商汤 COCO2018 检测任务冠军方案文章
Winning entry of COCO 2018 Challenge (object detection task) Hybrid Task Cascade for Instance Segmentation
https://arxiv.org/abs/1901.07518


[2] 小米用NAS做超分的技术报告
Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search
https://arxiv.org/abs/1901.07261


[3] 目标检测文章
Consistent Optimization for Single-Shot Object Detection
https://arxiv.org/abs/1901.06563


[4] 商汤的不均衡样本分类文章
Dynamic Curriculum Learning for Imbalanced Data Classification
https://arxiv.org/abs/1901.06783


[5] 人脸检测文章
Improved Selective Refinement Network for Face Detection
https://arxiv.org/abs/1901.06651


[6] 旷视的零售商品数据集
RPC: A Large-Scale Retail Product Checkout Dataset
https://arxiv.org/abs/1901.07249


[7] 人体属性识别综述
Pedestrian Attribute Recognition: A Survey
https://arxiv.org/abs/1901.07474
项目地址:https://sites.google.com/view/ahu-pedestrianattributes/


推荐文章

2018-07-17 17:04:14 csuyzt 阅读数 1393

计算机视觉相关论文整理、翻译、记录、分享;

包括图像分类、目标检测、视觉跟踪/目标跟踪、人脸识别/人脸验证等领域。

欢迎加星, 欢迎提问,欢迎指正错误, 同时也期待能够共同参与;长沙的朋友欢迎线下交流

持续更新中... ...

项目地址:https://github.com/yizt/cv-papers

基础网络

ResNeXt

目标检测

R-CNN 系列

R-CNN

Fast R-CNN

Faster R-CNN

FPN

Mask R-CNN

R-FCN

R-FCN-3000

Cascade R-CNN

YOLO

yolo v1

yolo 9000

yolo v3

SSD

SSD

DSSD

其它

AttractioNet

G-CNN

RetinaNet

人脸识别

FaceNet

视觉跟踪

Online Object Tracking: A Benchmark

FCNT

GOTURN

C-COT

SiameseFC

ocr/场景文本检测

CRNN

CTPN

附:计算机视觉经典论文地址汇总

依赖知识点

Hammersley-Clifford定理证明

 

2017-01-19 18:01:31 wangss9566 阅读数 31520

计算机视觉入门系列(一) 综述

自大二下学期以来,学习计算机视觉及机器学习方面的各种课程和论文,也亲身参与了一些项目,回想起来求学过程中难免走了不少弯路和坎坷,至今方才敢说堪堪入门。因此准备写一个计算机视觉方面的入门文章,一来是时间长了以后为了巩固和温习一下所学,另一方面也希望能给新入门的同学们介绍一些经验,还有自然是希望各位牛人能够批评指正不吝赐教。由于临近大四毕业,更新的时间难以保证,这个系列除了在理论上面会有一些介绍以外,也会提供几个小项目进行实践,我会尽可能不断更新下去。

因诸多学术理论及概念的原始论文都发表在英文期刊上,因此在尽可能将专业术语翻译成中文的情况下,都会在括号内保留其原始的英文短语以供参考。


目录

  • 简介
  • 方向
  • 热点

简介

计算机视觉(Computer Vision)又称为机器视觉(Machine Vision),顾名思义是一门“教”会计算机如何去“看”世界的学科。在机器学习大热的前景之下,计算机视觉与自然语言处理(Natural Language Process, NLP)及语音识别(Speech Recognition)并列为机器学习方向的三大热点方向。而计算机视觉也由诸如梯度方向直方图(Histogram of Gradient, HOG)以及尺度不变特征变换(Scale-Invariant Feature Transform, SIFT)等传统的手办特征(Hand-Crafted Feature)与浅层模型的组合逐渐转向了以卷积神经网络(Convolutional Neural Network, CNN)为代表的深度学习模型。

方式 特征提取 决策模型
传统方式 SIFT,HOG, Raw Pixel … SVM, Random Forest, Linear Regression …
深度学习 CNN … CNN …

svm(Support Vector Machine) : 支持向量机
Random Forest : 随机森林
Linear Regression : 线性回归
Raw Pixel : 原始像素

传统的计算机视觉对待问题的解决方案基本上都是遵循: 图像预处理 → 提取特征 → 建立模型(分类器/回归器) → 输出 的流程。 而在深度学习中,大多问题都会采用端到端(End to End)的解决思路,即从输入到输出一气呵成。本次计算机视觉的入门系列,将会从浅层学习入手,由浅入深过渡到深度学习方面。

方向

计算机视觉本身又包括了诸多不同的研究方向,比较基础和热门的几个方向主要包括了:物体识别和检测(Object Detection),语义分割(Semantic Segmentation),运动和跟踪(Motion & Tracking),三维重建(3D Reconstruction),视觉问答(Visual Question & Answering),动作识别(Action Recognition)等。

物体识别和检测

物体检测一直是计算机视觉中非常基础且重要的一个研究方向,大多数新的算法或深度学习网络结构都首先在物体检测中得以应用如VGG-net, GoogLeNet, ResNet等等,每年在imagenet数据集上面都不断有新的算法涌现,一次次突破历史,创下新的记录,而这些新的算法或网络结构很快就会成为这一年的热点,并被改进应用到计算机视觉中的其它应用中去,可以说很多灌水的文章也应运而生。

物体识别和检测,顾名思义,即给定一张输入图片,算法能够自动找出图片中的常见物体,并将其所属类别及位置输出出来。当然也就衍生出了诸如人脸检测(Face Detection),车辆检测(Viechle Detection)等细分类的检测算法。
这里写图片描述

近年代表论文

  1. He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
  2. Liu, Wei, et al. “SSD: Single shot multibox detector.” European Conference on Computer Vision. Springer International Publishing, 2016.
  3. Szegedy, Christian, et al. “Going deeper with convolutions.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
  4. Ren, Shaoqing, et al. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015.
  5. Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
  6. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.

数据集

  1. IMAGENET
  2. PASCAL VOC
  3. MS COCO
  4. Caltech

语义分割

语义分割是近年来非常热门的方向,简单来说,它其实可以看做一种特殊的分类——将输入图像的每一个像素点进行归类,用一张图就可以很清晰地描述出来。
这里写图片描述
很清楚地就可以看出,物体检测和识别通常是将物体在原图像上框出,可以说是“宏观”上的物体,而语义分割是从每一个像素上进行分类,图像中的每一个像素都有属于自己的类别。

近年代表论文

  1. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
  2. Chen, Liang-Chieh, et al. “Semantic image segmentation with deep convolutional nets and fully connected crfs.” arXiv preprint arXiv:1412.7062 (2014).
  3. Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. “Learning deconvolution network for semantic segmentation.” Proceedings of the IEEE International Conference on Computer Vision. 2015.
  4. Zheng, Shuai, et al. “Conditional random fields as recurrent neural networks.” Proceedings of the IEEE International Conference on Computer Vision. 2015.

数据集

  1. PASCAL VOC
  2. MS COCO

运动和跟踪

跟踪也属于计算机视觉领域内的基础问题之一,在近年来也得到了非常充足的发展,方法也由过去的非深度算法跨越向了深度学习算法,精度也越来越高,不过实时的深度学习跟踪算法精度一直难以提升,而精度非常高的跟踪算法的速度又十分之慢,因此在实际应用中也很难派上用场。
那么什么是跟踪呢?就目前而言,学术界对待跟踪的评判标准主要是在一段给定的视频中,在第一帧给出被跟踪物体的位置及尺度大小,在后续的视频当中,跟踪算法需要从视频中去寻找到被跟踪物体的位置,并适应各类光照变换,运动模糊以及表观的变化等。但实际上跟踪是一个不适定问题(ill posed problem),比如跟踪一辆车,如果从车的尾部开始跟踪,若是车辆在行进过程中表观发生了非常大的变化,如旋转了180度变成了侧面,那么现有的跟踪算法很大的可能性是跟踪不到的,因为它们的模型大多基于第一帧的学习,虽然在随后的跟踪过程中也会更新,但受限于训练样本过少,所以难以得到一个良好的跟踪模型,在被跟踪物体的表观发生巨大变化时,就难以适应了。所以,就目前而言,跟踪算不上是计算机视觉内特别热门的一个研究方向,很多算法都改进自检测或识别算法。
这里写图片描述

近年代表论文

  1. Nam, Hyeonseob, and Bohyung Han. “Learning multi-domain convolutional neural networks for visual tracking.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
  2. Held, David, Sebastian Thrun, and Silvio Savarese. “Learning to track at 100 fps with deep regression networks.” European Conference on Computer Vision. Springer International Publishing, 2016.
  3. Henriques, João F., et al. “High-speed tracking with kernelized correlation filters.” IEEE Transactions on Pattern Analysis and Machine Intelligence 37.3 (2015): 583-596.
  4. Ma, Chao, et al. “Hierarchical convolutional features for visual tracking.” Proceedings of the IEEE International Conference on Computer Vision. 2015.
  5. Bertinetto, Luca, et al. “Fully-convolutional siamese networks for object tracking.” European Conference on Computer Vision. Springer International Publishing, 2016.
  6. Danelljan, Martin, et al. “Beyond correlation filters: Learning continuous convolution operators for visual tracking.” European Conference on Computer Vision. Springer International Publishing, 2016.
  7. Li, Hanxi, Yi Li, and Fatih Porikli. “Deeptrack: Learning discriminative feature representations online for robust visual tracking.” IEEE Transactions on Image Processing 25.4 (2016): 1834-1848.

数据集

  1. OTB(Object Tracking Benchmark)
  2. VOT(Visual Object Tracking)

视觉问答

视觉问答也简称VQA(Visual Question Answering),是近年来非常热门的一个方向,其研究目的旨在根据输入图像,由用户进行提问,而算法自动根据提问内容进行回答。除了问答以外,还有一种算法被称为标题生成算法(Caption Generation),即计算机根据图像自动生成一段描述该图像的文本,而不进行问答。对于这类跨越两种数据形态(如文本和图像)的算法,有时候也可以称之为多模态,或跨模态问题。
这里写图片描述

近年代表论文

  1. Xiong, Caiming, Stephen Merity, and Richard Socher. “Dynamic memory networks for visual and textual question answering.” arXiv 1603 (2016).
  2. Wu, Qi, et al. “Ask me anything: Free-form visual question answering based on knowledge from external sources.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
  3. Zhu, Yuke, et al. “Visual7w: Grounded question answering in images.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

数据集

  1. VQA

热点

随着深度学习的大举侵入,现在几乎所有人工智能方向的研究论文几乎都被深度学习占领了,传统方法已经很难见到了。有时候在深度网络上改进一个非常小的地方,就可以发一篇还不错的论文。并且,随着深度学习的发展,很多领域的现有数据集内的记录都在不断刷新,已经向人类记录步步紧逼,有的方面甚至已经超越了人类的识别能力。那么,下一步的研究热点到底会在什么方向呢?就我个人的一些观点如下:

  1. 多模态研究: 目前的许多领域还是仅仅停留在单一的模态上,如单一分物体检测,物体识别等,而众所周知的是现实世界就是有多模态数据构成的,语音,图像,文字等等。 VQA 在近年来兴起的趋势可见,未来几年内,多模态的研究方向还是比较有前景的,如语音和图像结合,图像和文字结合,文字和语音结合等等。
  2. 数据生成: 现在机器学习领域的许多数据还是由现实世界拍摄的视频及图片经过人工标注后用作于训练或测试数据的,标注人员的职业素养和经验,以及多人标注下的规则统一难度在一定程度上也直接影响了模型的最终结果。而利用深度模型自动生成数据已经成为了一个新的研究热点方向,如何使用算法来自动生成数据相信在未来一段时间内都是不错的研究热点。
  3. 无监督学习:人脑的在学习过程中有许多时间都是无监督(Un-supervised Learning)的,而现有的算法无论是检测也好识别也好,在训练上都是依赖于人工标注的有监督(Supervised Learning)。如何将机器学习从有监督学习转变向无监督学习,应该是一个比较有挑战性的研究方向,当然这里的无监督学习当然不是指简单的如聚类算法(Clustering)这样的无监督算法。而LeCun也曾说: 如果将人工智能比喻作一块蛋糕的话,有监督学习只能算是蛋糕上的糖霜,而增强学习(Reinforce Learning)则是蛋糕上的樱桃,无监督学习才是真正蛋糕的本体。

    最后,想要把握领域内最新的研究成果和动态,还需要多看论文,多写代码。
    计算机视觉领域内的三大顶级会议有:

    Conference on Computer Vision and Pattern Recognition (CVPR)
    International Conference on Computer Vision (ICCV)
    European Conference on Computer Vision (ECCV)

    较好的会议有以下几个:

    The British Machine Vision Conference (BMVC)
    International Conference on Image Processing (ICIP)
    Winter Conference on Applications of Computer Vision (WACV)
    Asian Conference on Computer Vision (ACCV)

当然,毕竟文章的发表需要历经审稿和出版的阶段,因此当会议论文集出版的时候很可能已经过了小半年了,如果想要了解最新的研究,建议每天都上ArXiv的cv板块看看,ArXiv上都是预出版的文章,并不一定最终会被各类会议和期刊接收,所以质量也就良莠不齐,对于没有分辨能力的入门新手而言,还是建议从顶会和顶级期刊上的经典论文入手。


这是一篇对计算机视觉目前研究领域的几个热门方向的一个非常非常简单的介绍,希望能对想要入坑计算机视觉方向的同学有一定的帮助。由于个人水平十分有限,错误在所难免,欢迎大家对文中的错误进行批评和指正。

2018-05-30 10:19:42 m0_37592397 阅读数 9075

经典论文

计算机视觉论文

  1. ImageNet分类
  2. 物体检测
  3. 物体跟踪
  4. 低级视觉
  5. 边缘检测
  6. 语义分割
  7. 视觉注意力和显著性
  8. 物体识别
  9. 人体姿态估计
  10. CNN原理和性质(Understanding CNN)
  11. 图像和语言
  12. 图像解说
  13. 视频解说
  14. 图像生成

微软ResNet

论文:用于图像识别的深度残差网络

作者:何恺明、张祥雨、任少卿和孙剑

链接:http://arxiv.org/pdf/1512.03385v1.pdf

微软PRelu(随机纠正线性单元/权重初始化)

论文:深入学习整流器:在ImageNet分类上超越人类水平

作者:何恺明、张祥雨、任少卿和孙剑

链接:http://arxiv.org/pdf/1502.01852.pdf

谷歌Batch Normalization

论文:批量归一化:通过减少内部协变量来加速深度网络训练

作者:Sergey Ioffe, Christian Szegedy

链接:http://arxiv.org/pdf/1502.03167.pdf

谷歌GoogLeNet

论文:更深的卷积,CVPR 2015

作者:Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

链接:http://arxiv.org/pdf/1409.4842.pdf

牛津VGG-Net

论文:大规模视觉识别中的极深卷积网络,ICLR 2015

作者:Karen Simonyan & Andrew Zisserman

链接:http://arxiv.org/pdf/1409.1556.pdf

AlexNet

论文:使用深度卷积神经网络进行ImageNet分类

作者:Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

链接:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

物体检测

这里写图片描述

PVANET

论文:用于实时物体检测的深度轻量神经网络(PVANET:Deep but Lightweight Neural Networks for Real-time Object Detection)

作者:Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park

链接:http://arxiv.org/pdf/1608.08021

纽约大学OverFeat

论文:使用卷积网络进行识别、定位和检测(OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks),ICLR 2014

作者:Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun

链接:http://arxiv.org/pdf/1312.6229.pdf

伯克利R-CNN

论文:精确物体检测和语义分割的丰富特征层次结构(Rich feature hierarchies for accurate object detection and semantic segmentation),CVPR 2014

作者:Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

链接:http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

微软SPP

论文:视觉识别深度卷积网络中的空间金字塔池化(Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition),ECCV 2014

作者:何恺明、张祥雨、任少卿和孙剑

链接:http://arxiv.org/pdf/1406.4729.pdf

微软Fast R-CNN

论文:Fast R-CNN

作者:Ross Girshick

链接:http://arxiv.org/pdf/1504.08083.pdf

微软Faster R-CNN

论文:使用RPN走向实时物体检测(Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks)

作者:任少卿、何恺明、Ross Girshick、孙剑

链接:http://arxiv.org/pdf/1506.01497.pdf

牛津大学R-CNN minus R

论文:R-CNN minus R

作者:Karel Lenc, Andrea Vedaldi

链接:http://arxiv.org/pdf/1506.06981.pdf

端到端行人检测

论文:密集场景中端到端的行人检测(End-to-end People Detection in Crowded Scenes)

作者:Russell Stewart, Mykhaylo Andriluka

链接:http://arxiv.org/pdf/1506.04878.pdf

实时物体检测

论文:你只看一次:统一实时物体检测(You Only Look Once: Unified, Real-Time Object Detection)

作者:Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

链接:http://arxiv.org/pdf/1506.02640.pdf

Inside-Outside Net

论文:使用跳跃池化和RNN在场景中检测物体(Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks)

作者:Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

链接:http://arxiv.org/abs/1512.04143.pdf

微软ResNet

论文:用于图像识别的深度残差网络

作者:何恺明、张祥雨、任少卿和孙剑

链接:http://arxiv.org/pdf/1512.03385v1.pdf

R-FCN

论文:通过区域全卷积网络进行物体识别(R-FCN: Object Detection via Region-based Fully Convolutional Networks)

作者:代季峰,李益,何恺明,孙剑

链接:http://arxiv.org/abs/1605.06409

SSD

论文:单次多框检测器(SSD: Single Shot MultiBox Detector)

作者:Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

链接:http://arxiv.org/pdf/1512.02325v2.pdf

速度/精度权衡

论文:现代卷积物体检测器的速度/精度权衡(Speed/accuracy trade-offs for modern convolutional object detectors)

作者:Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

链接:http://arxiv.org/pdf/1611.10012v1.pdf

物体跟踪

  • 论文:用卷积神经网络通过学习可区分的显著性地图实现在线跟踪(Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network)

作者:Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han

地址:arXiv:1502.06796.

  • 论文:DeepTrack:通过视觉跟踪的卷积神经网络学习辨别特征表征(DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking)

作者:Hanxi Li, Yi Li and Fatih Porikli

发表: BMVC, 2014.

  • 论文:视觉跟踪中,学习深度紧凑图像表示(Learning a Deep Compact Image Representation for Visual Tracking)

作者:N Wang, DY Yeung

发表:NIPS, 2013.

  • 论文:视觉跟踪的分层卷积特征(Hierarchical Convolutional Features for Visual Tracking)

作者:Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang

发表: ICCV 2015

  • 论文:完全卷积网络的视觉跟踪(Visual Tracking with fully Convolutional Networks)

作者:Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu,

发表:ICCV 2015

  • 论文:学习多域卷积神经网络进行视觉跟踪(Learning Multi-Domain Convolutional Neural Networks for Visual Tracking)

作者:Hyeonseob Namand Bohyung Han

对象识别(Object Recognition)

论文:卷积神经网络弱监督学习(Weakly-supervised learning with convolutional neural networks)

作者:Maxime Oquab,Leon Bottou,Ivan Laptev,Josef Sivic,CVPR,2015

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Oquab_Is_Object_Localization_2015_CVPR_paper.pdf

FV-CNN

论文:深度滤波器组用于纹理识别和分割(Deep Filter Banks for Texture Recognition and Segmentation)

作者:Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi, CVPR, 2015.

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Cimpoi_Deep_Filter_Banks_2015_CVPR_paper.pdf

人体姿态估计(Human Pose Estimation)

  • 论文:使用 Part Affinity Field的实时多人2D姿态估计(Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields)

作者:Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, CVPR, 2017.

  • 论文:Deepcut:多人姿态估计的联合子集分割和标签(Deepcut: Joint subset partition and labeling for multi person pose estimation)

作者:Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele, CVPR, 2016.

  • 论文:Convolutional pose machines

作者:Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh, CVPR, 2016.

  • 论文:人体姿态估计的 Stacked hourglass networks(Stacked hourglass networks for human pose estimation)

作者:Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV, 2016.

  • 论文:用于视频中人体姿态估计的Flowing convnets(Flowing convnets for human pose estimation in videos)

作者:Tomas Pfister, James Charles, and Andrew Zisserman, ICCV, 2015.

  • 论文:卷积网络和人类姿态估计图模型的联合训练(Joint training of a convolutional network and a graphical model for human pose estimation)

作者:Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler, NIPS, 2014.

理解CNN

这里写图片描述

  • 论文:通过测量同变性和等价性来理解图像表示(Understanding image representations by measuring their equivariance and equivalence)

作者:Karel Lenc, Andrea Vedaldi, CVPR, 2015.

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf

  • 论文:深度神经网络容易被愚弄:无法识别的图像的高置信度预测(Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images)

作者:Anh Nguyen, Jason Yosinski, Jeff Clune, CVPR, 2015.

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf

  • 论文:通过反演理解深度图像表示(Understanding Deep Image Representations by Inverting Them)

作者:Aravindh Mahendran, Andrea Vedaldi, CVPR, 2015

链接:
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf

  • 论文:深度场景CNN中的对象检测器(Object Detectors Emerge in Deep Scene CNNs)

作者:Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, ICLR, 2015.

链接:http://arxiv.org/abs/1412.6856

  • 论文:用卷积网络反演视觉表示(Inverting Visual Representations with Convolutional Networks)

作者:Alexey Dosovitskiy, Thomas Brox, arXiv, 2015.

链接:http://arxiv.org/abs/1506.02753

  • 论文:可视化和理解卷积网络(Visualizing and Understanding Convolutional Networks)

作者:Matthrew Zeiler, Rob Fergus, ECCV, 2014.

链接:http://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf

图像与语言

图像说明(Image Captioning)

这里写图片描述

UCLA / Baidu

用多模型循环神经网络解释图像(Explain Images with Multimodal Recurrent Neural Networks)

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille, arXiv:1410.1090

http://arxiv.org/pdf/1410.1090

Toronto

使用多模型神经语言模型统一视觉语义嵌入(Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models)

Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel, arXiv:1411.2539.

http://arxiv.org/pdf/1411.2539

Berkeley

用于视觉识别和描述的长期循环卷积网络(Long-term Recurrent Convolutional Networks for Visual Recognition and Description)

Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, arXiv:1411.4389.

http://arxiv.org/pdf/1411.4389

Google

看图写字:神经图像说明生成器(Show and Tell: A Neural Image Caption Generator)

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, arXiv:1411.4555.

http://arxiv.org/pdf/1411.4555

Stanford

用于生成图像描述的深度视觉语义对齐(Deep Visual-Semantic Alignments for Generating Image Description)

Andrej Karpathy, Li Fei-Fei, CVPR, 2015.

Web:http://cs.stanford.edu/people/karpathy/deepimagesent/

Paper:http://cs.stanford.edu/people/karpathy/cvpr2015.pdf

UML / UT

使用深度循环神经网络将视频转换为自然语言(Translating Videos to Natural Language Using Deep Recurrent Neural Networks)

Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, NAACL-HLT, 2015.

http://arxiv.org/pdf/1412.4729

CMU / Microsoft

学习图像说明生成的循环视觉表示(Learning a Recurrent Visual Representation for Image Caption Generation)

Xinlei Chen, C. Lawrence Zitnick, arXiv:1411.5654.

Xinlei Chen, C. Lawrence Zitnick, Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation, CVPR 2015

http://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf

Microsoft

从图像说明到视觉概念(From Captions to Visual Concepts and Back)

Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, CVPR, 2015.

http://arxiv.org/pdf/1411.4952

Univ. Montreal / Univ. Toronto

Show, Attend, and Tell:视觉注意力与神经图像标题生成(Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention)

Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, arXiv:1502.03044 / ICML 2015

http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf

Idiap / EPFL / Facebook

基于短语的图像说明(Phrase-based Image Captioning)

Remi Lebret, Pedro O. Pinheiro, Ronan Collobert, arXiv:1502.03671 / ICML 2015

http://arxiv.org/pdf/1502.03671

UCLA / Baidu

像孩子一样学习:从图像句子描述快速学习视觉的新概念(Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images)

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, arXiv:1504.06692

http://arxiv.org/pdf/1504.06692

MS + Berkeley

探索图像说明的最近邻方法( Exploring Nearest Neighbor Approaches for Image Captioning)

Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick, arXiv:1505.04467

http://arxiv.org/pdf/1505.04467.pdf

图像说明的语言模型(Language Models for Image Captioning: The Quirks and What Works)

Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell, arXiv:1505.01809

http://arxiv.org/pdf/1505.01809.pdf

阿德莱德

具有中间属性层的图像说明( Image Captioning with an Intermediate Attributes Layer)

Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick, arXiv:1506.01144

蒂尔堡

通过图片学习语言(Learning language through pictures)

Grzegorz Chrupala, Akos Kadar, Afra Alishahi, arXiv:1506.03694

蒙特利尔大学

使用基于注意力的编码器-解码器网络描述多媒体内容(Describing Multimedia Content using Attention-based Encoder-Decoder Networks)

Kyunghyun Cho, Aaron Courville, Yoshua Bengio, arXiv:1507.01053

康奈尔

图像表示和神经图像说明的新领域(Image Representations and New Domains in Neural Image Captioning)

Jack Hessel, Nicolas Savva, Michael J. Wilber, arXiv:1508.02091

MS + City Univ. of HongKong

Learning Query and Image Similarities with Ranking Canonical Correlation Analysis

Ting Yao, Tao Mei, and Chong-Wah Ngo, ICCV, 2015

视频字幕(Video Captioning)

伯克利

Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015.

犹他州/ UML / 伯克利

Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729.

微软

Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861.

犹他州/ UML / 伯克利

Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence–Video to Text, arXiv:1505.00487.

蒙特利尔大学/ 舍布鲁克

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029

MPI / 伯克利

Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698

多伦多大学 / MIT

Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724

蒙特利尔大学

Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053

TAU / 美国南加州大学

Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950.

图像生成

卷积/循环网络
  • 论文:Conditional Image Generation with PixelCNN Decoders”

作者:Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

  • 论文:Learning to Generate Chairs with Convolutional Neural Networks

作者:Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox

发表:CVPR, 2015.

  • 论文:DRAW: A Recurrent Neural Network For Image Generation

作者:Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra

发表:ICML, 2015.

对抗网络
  • 论文:生成对抗网络(Generative Adversarial Networks)

作者:Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

发表:NIPS, 2014.

  • 论文:使用对抗网络Laplacian Pyramid 的深度生成图像模型(Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks)

作者:Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

发表:NIPS, 2015.

  • 论文:生成模型演讲概述 (A note on the evaluation of generative models)

作者:Lucas Theis, Aäron van den Oord, Matthias Bethge

发表:ICLR 2016.

  • 论文:变分自动编码深度高斯过程(Variationally Auto-Encoded Deep Gaussian Processes)

作者:Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence

发表:ICLR 2016.

  • 论文:用注意力机制从字幕生成图像 (Generating Images from Captions with Attention)

作者:Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov

发表: ICLR 2016

  • 论文:分类生成对抗网络的无监督和半监督学习(Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks)

作者:Jost Tobias Springenberg

发表:ICLR 2016

  • 论文:用一个对抗检测表征(Censoring Representations with an Adversary)

作者:Harrison Edwards, Amos Storkey

发表:ICLR 2016

  • 论文:虚拟对抗训练实现分布式顺滑 (Distributional Smoothing with Virtual Adversarial Training)

作者:Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii

发表:ICLR 2016

  • 论文:自然图像流形上的生成视觉操作(Generative Visual Manipulation on the Natural Image Manifold)

作者:朱俊彦, Philipp Krahenbuhl, Eli Shechtman, and Alexei A. Efros

发表: ECCV 2016.

  • 论文:深度卷积生成对抗网络的无监督表示学习(Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks)

作者:Alec Radford, Luke Metz, Soumith Chintala

发表: ICLR 2016

问题回答

这里写图片描述

弗吉尼亚大学 / 微软研究院

论文:VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.

作者:Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh

MPI / 伯克利

论文:Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

作者:Mateusz Malinowski, Marcus Rohrbach, Mario Fritz,

发布 : arXiv:1505.01121.

多伦多

论文: Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

作者:Mengye Ren, Ryan Kiros, Richard Zemel

发表: arXiv:1505.02074 / ICML 2015 deep learning workshop.

百度/ 加州大学洛杉矶分校

作者:Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, 徐伟

论文:Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

发表: arXiv:1505.05612.

POSTECH(韩国)

论文:Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

作者:Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han

发表: arXiv:1511.05765

CMU / 微软研究院

论文:Stacked Attention Networks for Image Question Answering

作者:Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2015)

发表: arXiv:1511.02274.

MetaMind

论文:Dynamic Memory Networks for Visual and Textual Question Answering

作者:Xiong, Caiming, Stephen Merity, and Richard Socher

发表: arXiv:1603.01417 (2016).

首尔国立大学 + NAVER

论文:Multimodal Residual Learning for Visual QA

作者:Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

发表:arXiv:1606:01455

UC Berkeley + 索尼

论文:Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

作者:Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach

发表:arXiv:1606.01847

Postech

论文:Training Recurrent Answering Units with Joint Loss Minimization for VQA

作者:Hyeonwoo Noh and Bohyung Han

发表: arXiv:1606.03647

首尔国立大学 + NAVER

论文: Hadamard Product for Low-rank Bilinear Pooling

作者:Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhan

发表:arXiv:1610.04325.

视觉注意力和显著性

这里写图片描述
论文:Predicting Eye Fixations using Convolutional Neural Networks

作者:Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu

发表:CVPR, 2015.

学习地标的连续搜索

作者:Learning a Sequential Search for Landmarks

论文:Saurabh Singh, Derek Hoiem, David Forsyth

发表:CVPR, 2015.

视觉注意力机制实现多物体识别

论文:Multiple Object Recognition with Visual Attention

作者:Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu,

发表:ICLR, 2015.

视觉注意力机制的循环模型

作者:Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

论文:Recurrent Models of Visual Attention

发表:NIPS, 2014.

低级视觉

超分辨率
  • Iterative Image Reconstruction

Sven Behnke: Learning Iterative Image Reconstruction. IJCAI, 2001.

Sven Behnke: Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid. International Journal of Computational Intelligence and Applications, vol. 1, no. 4, pp. 427-438, 2001.

  • Super-Resolution (SRCNN)

Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution, ECCV, 2014.

Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Image Super-Resolution Using Deep Convolutional Networks, arXiv:1501.00092.

  • Very Deep Super-Resolution

Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Accurate Image Super-Resolution Using Very Deep Convolutional Networks, arXiv:1511.04587, 2015.

  • Deeply-Recursive Convolutional Network

Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Deeply-Recursive Convolutional Network for Image Super-Resolution, arXiv:1511.04491, 2015.

  • Casade-Sparse-Coding-Network

Zhaowen Wang, Ding Liu, Wei Han, Jianchao Yang and Thomas S. Huang, Deep Networks for Image Super-Resolution with Sparse Prior. ICCV, 2015.

  • Perceptual Losses for Super-Resolution

Justin Johnson, Alexandre Alahi, Li Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, arXiv:1603.08155, 2016.

  • SRGAN

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, arXiv:1609.04802v3, 2016.

其他应用

Optical Flow (FlowNet)

Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox, FlowNet: Learning Optical Flow with Convolutional Networks, arXiv:1504.06852.

Compression Artifacts Reduction

Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, arXiv:1504.06993.

Blur Removal

Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Schölkopf, Learning to Deblur, arXiv:1406.7444

Jian Sun, Wenfei Cao, Zongben Xu, Jean Ponce, Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal, CVPR, 2015

Image Deconvolution

Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia, Deep Convolutional Neural Network for Image Deconvolution, NIPS, 2014.

Deep Edge-Aware Filter

Li Xu, Jimmy SJ. Ren, Qiong Yan, Renjie Liao, Jiaya Jia, Deep Edge-Aware Filters, ICML, 2015.

Computing the Stereo Matching Cost with a Convolutional Neural Network

Jure Žbontar, Yann LeCun, Computing the Stereo Matching Cost with a Convolutional Neural Network, CVPR, 2015.

Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV, 2016

Feature Learning by Inpainting

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, Context Encoders: Feature Learning by Inpainting, CVPR, 2016

边缘检测

这里写图片描述
Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection, arXiv:1504.06375.

DeepEdge

Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015.

DeepContour

Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang, DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection, CVPR, 2015.

语义分割

这里写图片描述

SEC: Seed, Expand and Constrain

Alexander Kolesnikov, Christoph Lampert, Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation, ECCV, 2016.

Adelaide

Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient piecewise training of deep structured models for semantic segmentation, arXiv:1504.01013. (1st ranked in VOC2012)

Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel, Deeply Learning the Messages in Message Passing Inference, arXiv:1508.02108. (4th ranked in VOC2012)

Deep Parsing Network (DPN)

Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang, Semantic Image Segmentation via Deep Parsing Network, arXiv:1509.02634 / ICCV 2015 (2nd ranked in VOC 2012)

CentraleSuperBoundaries, INRIA

Iasonas Kokkinos, Surpassing Humans in Boundary Detection using Deep Learning, arXiv:1411.07386 (4th ranked in VOC 2012)

BoxSup

Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640. (6th ranked in VOC2012)

POSTECH

Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation, arXiv:1505.04366. (7th ranked in VOC2012)

Seunghoon Hong, Hyeonwoo Noh, Bohyung Han, Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation, arXiv:1506.04924.

Seunghoon Hong,Junhyuk Oh,Bohyung Han, andHonglak Lee, Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, arXiv:1512.07928

Conditional Random Fields as Recurrent Neural Networks

Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr, Conditional Random Fields as Recurrent Neural Networks, arXiv:1502.03240. (8th ranked in VOC2012)

DeepLab

Liang-Chieh Chen, George Papandreou, Kevin Murphy, Alan L. Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image segmentation, arXiv:1502.02734. (9th ranked in VOC2012)

Zoom-out

Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015

Joint Calibration

Holger Caesar, Jasper Uijlings, Vittorio Ferrari, Joint Calibration for Semantic Segmentation, arXiv:1507.01581.

Fully Convolutional Networks for Semantic Segmentation

Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.

Hypercolumn

Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for Object Segmentation and Fine-Grained Localization, CVPR, 2015.

Deep Hierarchical Parsing

Abhishek Sharma, Oncel Tuzel, David W. Jacobs, Deep Hierarchical Parsing for Semantic Segmentation, CVPR, 2015.

Learning Hierarchical Features for Scene Labeling

Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, ICML, 2012.

Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Learning Hierarchical Features for Scene Labeling, PAMI, 2013.

University of Cambridge

Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” arXiv preprint arXiv:1511.00561, 2015.

Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla “Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding.” arXiv preprint arXiv:1511.02680, 2015.

Princeton

Fisher Yu, Vladlen Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions”, ICLR 2016

Univ. of Washington, Allen AI

Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi, “Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing”, ICCV, 2015

INRIA

Iasonas Kokkinos, “Pusing the Boundaries of Boundary Detection Using deep Learning”, ICLR 2016

UCSB

Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, “Weakly supervised graph based semantic segmentation by learning communities of image-parts”, ICCV, 2015

其他资源

课程

深度视觉

[斯坦福] CS231n: Convolutional Neural Networks for Visual Recognition

[香港中文大学] ELEG 5040: Advanced Topics in Signal Processing(Introduction to Deep Learning)

· 更多深度课程推荐

[斯坦福] CS224d: Deep Learning for Natural Language Processing

[牛津 Deep Learning by Prof. Nando de Freitas

[纽约大学] Deep Learning by Prof. Yann LeCun

图书

免费在线图书

Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

Neural Networks and Deep Learning by Michael Nielsen

Deep Learning Tutorial by LISA lab, University of Montreal

视频

演讲

Deep Learning, Self-Taught Learning and Unsupervised Feature Learning By Andrew Ng

Recent Developments in Deep Learning By Geoff Hinton

The Unreasonable Effectiveness of Deep Learning by Yann LeCun

Deep Learning of Representations by Yoshua bengio

软件

框架
  • Tensorflow: An open source software library for numerical computation using data flow graph by Google [Web]
  • Torch7: Deep learning library in Lua, used by Facebook and Google Deepmind [Web]
  • Torch-based deep learning libraries: [torchnet],
  • Caffe: Deep learning framework by the BVLC [Web]
  • Theano: Mathematical library in Python, maintained by LISA lab [Web]
  • Theano-based deep learning libraries: [Pylearn2], [Blocks], [Keras], [Lasagne]
  • MatConvNet: CNNs for MATLAB [Web]
  • MXNet: A flexible and efficient deep learning library for heterogeneous distributed systems with multi-language support [Web]
  • Deepgaze: A computer vision library for human-computer interaction based on CNNs [Web]

应用

  • 对抗训练 Code and hyperparameters for the paper “Generative Adversarial Networks” [Web]
  • 理解与可视化 Source code for “Understanding Deep Image Representations by Inverting Them,” CVPR, 2015. [Web]
  • 词义分割 Source code for the paper “Rich feature hierarchies for accurate object detection and semantic segmentation,” CVPR, 2014. [Web] ; Source code for the paper “Fully Convolutional Networks for Semantic Segmentation,” CVPR, 2015. [Web]
  • 超分辨率 Image Super-Resolution for Anime-Style-Art [Web]
  • 边缘检测 Source code for the paper “DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection,” CVPR, 2015. [Web]
  • Source code for the paper “Holistically-Nested Edge Detection”, ICCV 2015. [Web]

讲座

  • [CVPR 2014] Tutorial on Deep Learning in Computer Vision
  • [CVPR 2015] Applied Deep Learning for Computer Vision with Torch

博客

  • Deep down the rabbit hole: CVPR 2015 and beyond@Tombone’s Computer Vision Blog
  • CVPR recap and where we’re going@Zoya Bylinskii (MIT PhD Student)’s Blog
  • Facebook’s AI Painting@Wired
  • Inceptionism: Going Deeper into Neural Networks@Google Research
  • Implementing Neural networks