精华内容
下载资源
问答
  • 2021年必读的10 个计算机视觉论文总结
    千次阅读
    2022-01-08 09:58:08

    2021 年排名前 10 的计算机视觉论文,包括视频演示、文章、代码和论文参考。

    世界的经济活动在病毒的冲击下陷入了历史罕见的停滞中,但研究并没有放慢其狂热的步伐,尤其是在人工智能领域。今年的论文中除了一般的研究结果外还强调了许多重要方面,例如道德方面、重要偏见、治理、透明度等等。人工智能和我们对人脑及其与人工智能的联系的理解不断发展,显示出在不久的将来改善我们生活质量的有前景的应用。不过,我们应该谨慎选择应用哪种技术。

    “科学不能告诉我们应该做什么,只能告诉我们可以做什么。”
    —— Jean-Paul Sartre, Being and Nothingness

    以下是我总结的今年计算机视觉领域最有趣的 10 篇研究论文,简而言之,它基本上是一个精选的 AI 和 CV 最新突破列表,本篇文章种将带有清晰的视频解释和代码(如果有)。本文末尾列出了对每篇论文的完整参考。如果还有什么推荐,请直接联系我。

    DALL·E: Zero-Shot Text-to-Image Generation from OpenAI [1]

    OpenAI 成功训练了一个能够从文本标题生成图像的网络。 它与 GPT-3 和 Image GPT 非常相似,并产生了惊人的结果。

    代码: https://github.com/openai/DALL-E

    Taming Transformers for High-Resolution Image Synthesis [2]

    将 GAN 和卷积方法的效率与Transformers 的表达能力相结合,为语义引导的高质量图像合成提供了一种强大且省时的方法。

    代码 https://github.com/CompVis/taming-transformers

    Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [3]

    Transformers 会取代计算机视觉中的 CNNs 吗? 在不到 5 分钟的时间内,通过一篇名为 Swin Transformer 的新论文了解如何将 Transformer 架构应用于计算机视觉。

    代码 https://github.com/microsoft/Swin-Transformer

    Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image [4]

    视图合成的下一步:目标是拍摄一张图像,然后就可以进到图像中去探索风景!

    https://colab.research.google.com/github/google-research/google-research/blob/master/infinite_nature/infinite_nature_demo.ipynb#scrollTo=sCuRX1liUEVM

    Total Relighting: Learning to Relight Portraits for Background Replacement [5]

    根据添加的新背景的亮度重新为肖像补光。 你有没有想过改变图片的背景,但让它看起来很逼真? 如果已经尝试过就会知道这并不简单。 你在家里拍一张自己的照片然后改变成海滩的背景, 任何人都会在一秒钟内说“那是经过Photoshop处理的”。 对于电影和专业视频,需要完美的灯光和艺术家来再现高质量的图像,这非常昂贵。 你无法用自己的照片做到这一点。 但是这篇论文做到了

    Animating Pictures with Eulerian Motion Fields [6]

    该模型只通过拍摄一张照片,就能够了解哪些粒子应该在移动,并可以在限循环中为它们设置逼真的动画,同时完全保留图片的其余部分,这样我们可以将图片转换成动画……

    代码https://eulerian.cs.washington.edu/

    CVPR 2021 Best Paper Award: GIRAFFE — Controllable Image Generation [7]

    使用修改后的 GAN 架构,他们可以在不影响背景或其他对象的情况下移动图像中的对象!

    代码https://github.com/autonomousvision/giraffe

    TimeLens: Event-based Video Frame Interpolation [8]

    TimeLens 可以理解视频帧之间粒子的运动,用我们肉眼看不到的速度重建真正发生的事情。 它达到了智能手机和其他机型无法达到的效果!

    https://github.com/uzh-rpg/rpg_timelens

    CLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis [9]

    你有没有想过把照片的风格,比如左边这个很酷的绘画风格,应用到你选择的新照片上? 这个模型能够做到,甚至可以仅从文本中实现这一点,并且还提供了可以立即尝试使用这种新方法及其适用于所有人的 Google Colab 。 简单的拍一张你要复制的样式的图片,输入你要生成的文字,这个算法就会生成一张新的图片! 结果非常令人印象深刻,特别它们可以由一行文本制成的!

    https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb

    https://colab.research.google.com/github/pschaldenbrand/StyleCLIPDraw/blob/master/Style_ClipDraw.ipynb

    CityNeRF: Building NeRF at City Scale [10]

    该模型称为 CityNeRF,是从 NeRF 发展而来的, NeRF 是最早使用辐射场和机器学习从图像构建 3D 模型的模型之一。 但 NeRF 效率不高而且只适用于单一规模。 在这里,CityNeRF 同时应用于卫星和地面图像,生成各种 3D 模型。 简而言之他们将 NeRF 带入了城市规模。

    https://city-super.github.io/citynerf/

    引用

    [1] A. Ramesh et al., Zero-shot text-to-image generation, 2021. arXiv:2102.12092

    [2] Taming Transformers for High-Resolution Image Synthesis, Esser et al., 2020.

    [3] Liu, Z. et al., 2021, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”, arXiv preprint https://arxiv.org/abs/2103.14030v1

    [bonus] Yuille, A.L., and Liu, C., 2021. Deep nets: What have they ever done for vision?. International Journal of Computer Vision, 129(3), pp.781–802, https://arxiv.org/abs/1805.04025.

    [4] Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N. and Kanazawa, A., 2020. Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image, https://arxiv.org/pdf/2012.09855.pdf

    [5] Pandey et al., 2021, Total Relighting: Learning to Relight Portraits for Background Replacement, doi: 10.1145/3450626.3459872, https://augmentedperception.github.io/total_relighting/total_relighting_paper.pdf.

    [6] Holynski, Aleksander, et al. “Animating Pictures with Eulerian Motion Fields.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

    [7] Michael Niemeyer and Andreas Geiger, (2021), “GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields”, Published in CVPR 2021.

    [8] Stepan Tulyakov*, Daniel Gehrig*, Stamatios Georgoulis, Julius Erbach, Mathias Gehrig, Yuanyou Li, Davide Scaramuzza, TimeLens: Event-based Video Frame Interpolation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 2021, http://rpg.ifi.uzh.ch/docs/CVPR21_Gehrig.pdf

    [9] a) CLIPDraw: exploring text-to-drawing synthesis through language-image encoders
    b) StyleCLIPDraw: Schaldenbrand, P., Liu, Z. and Oh, J., 2021. StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis.

    [10] Xiangli, Y., Xu, L., Pan, X., Zhao, N., Rao, A., Theobalt, C., Dai, B. and Lin, D., 2021. CityNeRF: Building NeRF at City Scale.

    本文作者:Louis Bouchard

    https://www.overfit.cn/post/e04755fce41e4db2959acfe688a7e3ac

    更多相关内容
  • 计算机视觉论文

    2018-08-11 09:23:49
    关于计算机视觉的一篇论文,关于计算机视觉的一篇论文
  • 关于近几年计算机视觉的发展概述,首先勾勒出一些基本的特征提取方法,然后介绍那些臭名昭着的深度神经网络。接下来,我们继续进行基于梯度直方图的特征提取和匹配 - 它们构建了许多任务的基础,例如目标实例检测和...
  • 简历 计算机视觉论文论文只要苟且的住,日子肯定过得去每天就看一点点,工作好找一丢丢
  • 自己整理的IJCAI2018计算机视觉方向的论文,总共90篇,有感兴趣的可以一阅。
  • 计算机视觉论文.zip

    2021-04-01 17:16:16
    计算机视觉课程论文
  • 斯坦福计算机视觉实验室04年至今所有论文,共分五份(资源大小限制),对计算机视觉有兴趣的可以下载学习,可用于了解计算机视觉发展体系
  • 斯坦福计算机视觉实验室04年至今所有论文,共分五份(资源大小限制),对计算机视觉有兴趣的可以下载学习,可用于了解计算机视觉发展体系
  • 计算机视觉论文.pdf

    2021-10-10 08:20:41
    计算机视觉论文.pdf
  • 计算机视觉论文整理

    万次阅读 多人点赞 2018-05-30 10:19:42
    本文梳理了2012到2017年计算机视觉领域的大事件:以论文和其他干货资源为主,并附上资源地址。囊括上百篇论文,分ImageNet 分类、物体检测、物体追踪、物体识别、图像与语言和图像生成等多个方向进行介绍。 上述的...

    经典论文

    计算机视觉论文

    1. ImageNet分类
    2. 物体检测
    3. 物体跟踪
    4. 低级视觉
    5. 边缘检测
    6. 语义分割
    7. 视觉注意力和显著性
    8. 物体识别
    9. 人体姿态估计
    10. CNN原理和性质(Understanding CNN)
    11. 图像和语言
    12. 图像解说
    13. 视频解说
    14. 图像生成

    微软ResNet

    论文:用于图像识别的深度残差网络

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1512.03385v1.pdf

    微软PRelu(随机纠正线性单元/权重初始化)

    论文:深入学习整流器:在ImageNet分类上超越人类水平

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1502.01852.pdf

    谷歌Batch Normalization

    论文:批量归一化:通过减少内部协变量来加速深度网络训练

    作者:Sergey Ioffe, Christian Szegedy

    链接:http://arxiv.org/pdf/1502.03167.pdf

    谷歌GoogLeNet

    论文:更深的卷积,CVPR 2015

    作者:Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

    链接:http://arxiv.org/pdf/1409.4842.pdf

    牛津VGG-Net

    论文:大规模视觉识别中的极深卷积网络,ICLR 2015

    作者:Karen Simonyan & Andrew Zisserman

    链接:http://arxiv.org/pdf/1409.1556.pdf

    AlexNet

    论文:使用深度卷积神经网络进行ImageNet分类

    作者:Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

    链接:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

    物体检测

    这里写图片描述

    PVANET

    论文:用于实时物体检测的深度轻量神经网络(PVANET:Deep but Lightweight Neural Networks for Real-time Object Detection)

    作者:Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park

    链接:http://arxiv.org/pdf/1608.08021

    纽约大学OverFeat

    论文:使用卷积网络进行识别、定位和检测(OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks),ICLR 2014

    作者:Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun

    链接:http://arxiv.org/pdf/1312.6229.pdf

    伯克利R-CNN

    论文:精确物体检测和语义分割的丰富特征层次结构(Rich feature hierarchies for accurate object detection and semantic segmentation),CVPR 2014

    作者:Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

    链接:http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

    微软SPP

    论文:视觉识别深度卷积网络中的空间金字塔池化(Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition),ECCV 2014

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1406.4729.pdf

    微软Fast R-CNN

    论文:Fast R-CNN

    作者:Ross Girshick

    链接:http://arxiv.org/pdf/1504.08083.pdf

    微软Faster R-CNN

    论文:使用RPN走向实时物体检测(Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks)

    作者:任少卿、何恺明、Ross Girshick、孙剑

    链接:http://arxiv.org/pdf/1506.01497.pdf

    牛津大学R-CNN minus R

    论文:R-CNN minus R

    作者:Karel Lenc, Andrea Vedaldi

    链接:http://arxiv.org/pdf/1506.06981.pdf

    端到端行人检测

    论文:密集场景中端到端的行人检测(End-to-end People Detection in Crowded Scenes)

    作者:Russell Stewart, Mykhaylo Andriluka

    链接:http://arxiv.org/pdf/1506.04878.pdf

    实时物体检测

    论文:你只看一次:统一实时物体检测(You Only Look Once: Unified, Real-Time Object Detection)

    作者:Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

    链接:http://arxiv.org/pdf/1506.02640.pdf

    Inside-Outside Net

    论文:使用跳跃池化和RNN在场景中检测物体(Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks)

    作者:Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

    链接:http://arxiv.org/abs/1512.04143.pdf

    微软ResNet

    论文:用于图像识别的深度残差网络

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1512.03385v1.pdf

    R-FCN

    论文:通过区域全卷积网络进行物体识别(R-FCN: Object Detection via Region-based Fully Convolutional Networks)

    作者:代季峰,李益,何恺明,孙剑

    链接:http://arxiv.org/abs/1605.06409

    SSD

    论文:单次多框检测器(SSD: Single Shot MultiBox Detector)

    作者:Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

    链接:http://arxiv.org/pdf/1512.02325v2.pdf

    速度/精度权衡

    论文:现代卷积物体检测器的速度/精度权衡(Speed/accuracy trade-offs for modern convolutional object detectors)

    作者:Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

    链接:http://arxiv.org/pdf/1611.10012v1.pdf

    物体跟踪

    • 论文:用卷积神经网络通过学习可区分的显著性地图实现在线跟踪(Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network)

    作者:Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han

    地址:arXiv:1502.06796.

    • 论文:DeepTrack:通过视觉跟踪的卷积神经网络学习辨别特征表征(DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking)

    作者:Hanxi Li, Yi Li and Fatih Porikli

    发表: BMVC, 2014.

    • 论文:视觉跟踪中,学习深度紧凑图像表示(Learning a Deep Compact Image Representation for Visual Tracking)

    作者:N Wang, DY Yeung

    发表:NIPS, 2013.

    • 论文:视觉跟踪的分层卷积特征(Hierarchical Convolutional Features for Visual Tracking)

    作者:Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang

    发表: ICCV 2015

    • 论文:完全卷积网络的视觉跟踪(Visual Tracking with fully Convolutional Networks)

    作者:Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu,

    发表:ICCV 2015

    • 论文:学习多域卷积神经网络进行视觉跟踪(Learning Multi-Domain Convolutional Neural Networks for Visual Tracking)

    作者:Hyeonseob Namand Bohyung Han

    对象识别(Object Recognition)

    论文:卷积神经网络弱监督学习(Weakly-supervised learning with convolutional neural networks)

    作者:Maxime Oquab,Leon Bottou,Ivan Laptev,Josef Sivic,CVPR,2015

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Oquab_Is_Object_Localization_2015_CVPR_paper.pdf

    FV-CNN

    论文:深度滤波器组用于纹理识别和分割(Deep Filter Banks for Texture Recognition and Segmentation)

    作者:Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Cimpoi_Deep_Filter_Banks_2015_CVPR_paper.pdf

    人体姿态估计(Human Pose Estimation)

    • 论文:使用 Part Affinity Field的实时多人2D姿态估计(Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields)

    作者:Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, CVPR, 2017.

    • 论文:Deepcut:多人姿态估计的联合子集分割和标签(Deepcut: Joint subset partition and labeling for multi person pose estimation)

    作者:Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele, CVPR, 2016.

    • 论文:Convolutional pose machines

    作者:Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh, CVPR, 2016.

    • 论文:人体姿态估计的 Stacked hourglass networks(Stacked hourglass networks for human pose estimation)

    作者:Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV, 2016.

    • 论文:用于视频中人体姿态估计的Flowing convnets(Flowing convnets for human pose estimation in videos)

    作者:Tomas Pfister, James Charles, and Andrew Zisserman, ICCV, 2015.

    • 论文:卷积网络和人类姿态估计图模型的联合训练(Joint training of a convolutional network and a graphical model for human pose estimation)

    作者:Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler, NIPS, 2014.

    理解CNN

    这里写图片描述

    • 论文:通过测量同变性和等价性来理解图像表示(Understanding image representations by measuring their equivariance and equivalence)

    作者:Karel Lenc, Andrea Vedaldi, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf

    • 论文:深度神经网络容易被愚弄:无法识别的图像的高置信度预测(Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images)

    作者:Anh Nguyen, Jason Yosinski, Jeff Clune, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf

    • 论文:通过反演理解深度图像表示(Understanding Deep Image Representations by Inverting Them)

    作者:Aravindh Mahendran, Andrea Vedaldi, CVPR, 2015

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf

    • 论文:深度场景CNN中的对象检测器(Object Detectors Emerge in Deep Scene CNNs)

    作者:Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, ICLR, 2015.

    链接:http://arxiv.org/abs/1412.6856

    • 论文:用卷积网络反演视觉表示(Inverting Visual Representations with Convolutional Networks)

    作者:Alexey Dosovitskiy, Thomas Brox, arXiv, 2015.

    链接:http://arxiv.org/abs/1506.02753

    • 论文:可视化和理解卷积网络(Visualizing and Understanding Convolutional Networks)

    作者:Matthrew Zeiler, Rob Fergus, ECCV, 2014.

    链接:http://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf

    图像与语言

    图像说明(Image Captioning)

    这里写图片描述

    UCLA / Baidu

    用多模型循环神经网络解释图像(Explain Images with Multimodal Recurrent Neural Networks)

    Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille, arXiv:1410.1090

    http://arxiv.org/pdf/1410.1090

    Toronto

    使用多模型神经语言模型统一视觉语义嵌入(Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models)

    Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel, arXiv:1411.2539.

    http://arxiv.org/pdf/1411.2539

    Berkeley

    用于视觉识别和描述的长期循环卷积网络(Long-term Recurrent Convolutional Networks for Visual Recognition and Description)

    Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, arXiv:1411.4389.

    http://arxiv.org/pdf/1411.4389

    Google

    看图写字:神经图像说明生成器(Show and Tell: A Neural Image Caption Generator)

    Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, arXiv:1411.4555.

    http://arxiv.org/pdf/1411.4555

    Stanford

    用于生成图像描述的深度视觉语义对齐(Deep Visual-Semantic Alignments for Generating Image Description)

    Andrej Karpathy, Li Fei-Fei, CVPR, 2015.

    Web:http://cs.stanford.edu/people/karpathy/deepimagesent/

    Paper:http://cs.stanford.edu/people/karpathy/cvpr2015.pdf

    UML / UT

    使用深度循环神经网络将视频转换为自然语言(Translating Videos to Natural Language Using Deep Recurrent Neural Networks)

    Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, NAACL-HLT, 2015.

    http://arxiv.org/pdf/1412.4729

    CMU / Microsoft

    学习图像说明生成的循环视觉表示(Learning a Recurrent Visual Representation for Image Caption Generation)

    Xinlei Chen, C. Lawrence Zitnick, arXiv:1411.5654.

    Xinlei Chen, C. Lawrence Zitnick, Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation, CVPR 2015

    http://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf

    Microsoft

    从图像说明到视觉概念(From Captions to Visual Concepts and Back)

    Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, CVPR, 2015.

    http://arxiv.org/pdf/1411.4952

    Univ. Montreal / Univ. Toronto

    Show, Attend, and Tell:视觉注意力与神经图像标题生成(Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention)

    Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, arXiv:1502.03044 / ICML 2015

    http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf

    Idiap / EPFL / Facebook

    基于短语的图像说明(Phrase-based Image Captioning)

    Remi Lebret, Pedro O. Pinheiro, Ronan Collobert, arXiv:1502.03671 / ICML 2015

    http://arxiv.org/pdf/1502.03671

    UCLA / Baidu

    像孩子一样学习:从图像句子描述快速学习视觉的新概念(Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images)

    Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, arXiv:1504.06692

    http://arxiv.org/pdf/1504.06692

    MS + Berkeley

    探索图像说明的最近邻方法( Exploring Nearest Neighbor Approaches for Image Captioning)

    Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick, arXiv:1505.04467

    http://arxiv.org/pdf/1505.04467.pdf

    图像说明的语言模型(Language Models for Image Captioning: The Quirks and What Works)

    Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell, arXiv:1505.01809

    http://arxiv.org/pdf/1505.01809.pdf

    阿德莱德

    具有中间属性层的图像说明( Image Captioning with an Intermediate Attributes Layer)

    Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick, arXiv:1506.01144

    蒂尔堡

    通过图片学习语言(Learning language through pictures)

    Grzegorz Chrupala, Akos Kadar, Afra Alishahi, arXiv:1506.03694

    蒙特利尔大学

    使用基于注意力的编码器-解码器网络描述多媒体内容(Describing Multimedia Content using Attention-based Encoder-Decoder Networks)

    Kyunghyun Cho, Aaron Courville, Yoshua Bengio, arXiv:1507.01053

    康奈尔

    图像表示和神经图像说明的新领域(Image Representations and New Domains in Neural Image Captioning)

    Jack Hessel, Nicolas Savva, Michael J. Wilber, arXiv:1508.02091

    MS + City Univ. of HongKong

    Learning Query and Image Similarities with Ranking Canonical Correlation Analysis

    Ting Yao, Tao Mei, and Chong-Wah Ngo, ICCV, 2015

    视频字幕(Video Captioning)

    伯克利

    Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015.

    犹他州/ UML / 伯克利

    Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729.

    微软

    Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861.

    犹他州/ UML / 伯克利

    Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence–Video to Text, arXiv:1505.00487.

    蒙特利尔大学/ 舍布鲁克

    Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029

    MPI / 伯克利

    Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698

    多伦多大学 / MIT

    Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724

    蒙特利尔大学

    Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053

    TAU / 美国南加州大学

    Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950.

    图像生成

    卷积/循环网络
    • 论文:Conditional Image Generation with PixelCNN Decoders”

    作者:Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

    • 论文:Learning to Generate Chairs with Convolutional Neural Networks

    作者:Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox

    发表:CVPR, 2015.

    • 论文:DRAW: A Recurrent Neural Network For Image Generation

    作者:Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra

    发表:ICML, 2015.

    对抗网络
    • 论文:生成对抗网络(Generative Adversarial Networks)

    作者:Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

    发表:NIPS, 2014.

    • 论文:使用对抗网络Laplacian Pyramid 的深度生成图像模型(Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks)

    作者:Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

    发表:NIPS, 2015.

    • 论文:生成模型演讲概述 (A note on the evaluation of generative models)

    作者:Lucas Theis, Aäron van den Oord, Matthias Bethge

    发表:ICLR 2016.

    • 论文:变分自动编码深度高斯过程(Variationally Auto-Encoded Deep Gaussian Processes)

    作者:Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence

    发表:ICLR 2016.

    • 论文:用注意力机制从字幕生成图像 (Generating Images from Captions with Attention)

    作者:Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov

    发表: ICLR 2016

    • 论文:分类生成对抗网络的无监督和半监督学习(Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks)

    作者:Jost Tobias Springenberg

    发表:ICLR 2016

    • 论文:用一个对抗检测表征(Censoring Representations with an Adversary)

    作者:Harrison Edwards, Amos Storkey

    发表:ICLR 2016

    • 论文:虚拟对抗训练实现分布式顺滑 (Distributional Smoothing with Virtual Adversarial Training)

    作者:Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii

    发表:ICLR 2016

    • 论文:自然图像流形上的生成视觉操作(Generative Visual Manipulation on the Natural Image Manifold)

    作者:朱俊彦, Philipp Krahenbuhl, Eli Shechtman, and Alexei A. Efros

    发表: ECCV 2016.

    • 论文:深度卷积生成对抗网络的无监督表示学习(Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks)

    作者:Alec Radford, Luke Metz, Soumith Chintala

    发表: ICLR 2016

    问题回答

    这里写图片描述

    弗吉尼亚大学 / 微软研究院

    论文:VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.

    作者:Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh

    MPI / 伯克利

    论文:Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

    作者:Mateusz Malinowski, Marcus Rohrbach, Mario Fritz,

    发布 : arXiv:1505.01121.

    多伦多

    论文: Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

    作者:Mengye Ren, Ryan Kiros, Richard Zemel

    发表: arXiv:1505.02074 / ICML 2015 deep learning workshop.

    百度/ 加州大学洛杉矶分校

    作者:Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, 徐伟

    论文:Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

    发表: arXiv:1505.05612.

    POSTECH(韩国)

    论文:Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

    作者:Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han

    发表: arXiv:1511.05765

    CMU / 微软研究院

    论文:Stacked Attention Networks for Image Question Answering

    作者:Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2015)

    发表: arXiv:1511.02274.

    MetaMind

    论文:Dynamic Memory Networks for Visual and Textual Question Answering

    作者:Xiong, Caiming, Stephen Merity, and Richard Socher

    发表: arXiv:1603.01417 (2016).

    首尔国立大学 + NAVER

    论文:Multimodal Residual Learning for Visual QA

    作者:Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

    发表:arXiv:1606:01455

    UC Berkeley + 索尼

    论文:Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

    作者:Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach

    发表:arXiv:1606.01847

    Postech

    论文:Training Recurrent Answering Units with Joint Loss Minimization for VQA

    作者:Hyeonwoo Noh and Bohyung Han

    发表: arXiv:1606.03647

    首尔国立大学 + NAVER

    论文: Hadamard Product for Low-rank Bilinear Pooling

    作者:Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhan

    发表:arXiv:1610.04325.

    视觉注意力和显著性

    这里写图片描述
    论文:Predicting Eye Fixations using Convolutional Neural Networks

    作者:Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu

    发表:CVPR, 2015.

    学习地标的连续搜索

    作者:Learning a Sequential Search for Landmarks

    论文:Saurabh Singh, Derek Hoiem, David Forsyth

    发表:CVPR, 2015.

    视觉注意力机制实现多物体识别

    论文:Multiple Object Recognition with Visual Attention

    作者:Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu,

    发表:ICLR, 2015.

    视觉注意力机制的循环模型

    作者:Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

    论文:Recurrent Models of Visual Attention

    发表:NIPS, 2014.

    低级视觉

    超分辨率
    • Iterative Image Reconstruction

    Sven Behnke: Learning Iterative Image Reconstruction. IJCAI, 2001.

    Sven Behnke: Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid. International Journal of Computational Intelligence and Applications, vol. 1, no. 4, pp. 427-438, 2001.

    • Super-Resolution (SRCNN)

    Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution, ECCV, 2014.

    Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Image Super-Resolution Using Deep Convolutional Networks, arXiv:1501.00092.

    • Very Deep Super-Resolution

    Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Accurate Image Super-Resolution Using Very Deep Convolutional Networks, arXiv:1511.04587, 2015.

    • Deeply-Recursive Convolutional Network

    Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Deeply-Recursive Convolutional Network for Image Super-Resolution, arXiv:1511.04491, 2015.

    • Casade-Sparse-Coding-Network

    Zhaowen Wang, Ding Liu, Wei Han, Jianchao Yang and Thomas S. Huang, Deep Networks for Image Super-Resolution with Sparse Prior. ICCV, 2015.

    • Perceptual Losses for Super-Resolution

    Justin Johnson, Alexandre Alahi, Li Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, arXiv:1603.08155, 2016.

    • SRGAN

    Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, arXiv:1609.04802v3, 2016.

    其他应用

    Optical Flow (FlowNet)

    Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox, FlowNet: Learning Optical Flow with Convolutional Networks, arXiv:1504.06852.

    Compression Artifacts Reduction

    Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, arXiv:1504.06993.

    Blur Removal

    Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Schölkopf, Learning to Deblur, arXiv:1406.7444

    Jian Sun, Wenfei Cao, Zongben Xu, Jean Ponce, Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal, CVPR, 2015

    Image Deconvolution

    Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia, Deep Convolutional Neural Network for Image Deconvolution, NIPS, 2014.

    Deep Edge-Aware Filter

    Li Xu, Jimmy SJ. Ren, Qiong Yan, Renjie Liao, Jiaya Jia, Deep Edge-Aware Filters, ICML, 2015.

    Computing the Stereo Matching Cost with a Convolutional Neural Network

    Jure Žbontar, Yann LeCun, Computing the Stereo Matching Cost with a Convolutional Neural Network, CVPR, 2015.

    Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV, 2016

    Feature Learning by Inpainting

    Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, Context Encoders: Feature Learning by Inpainting, CVPR, 2016

    边缘检测

    这里写图片描述
    Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection, arXiv:1504.06375.

    DeepEdge

    Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015.

    DeepContour

    Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang, DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection, CVPR, 2015.

    语义分割

    这里写图片描述

    SEC: Seed, Expand and Constrain

    Alexander Kolesnikov, Christoph Lampert, Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation, ECCV, 2016.

    Adelaide

    Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient piecewise training of deep structured models for semantic segmentation, arXiv:1504.01013. (1st ranked in VOC2012)

    Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel, Deeply Learning the Messages in Message Passing Inference, arXiv:1508.02108. (4th ranked in VOC2012)

    Deep Parsing Network (DPN)

    Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang, Semantic Image Segmentation via Deep Parsing Network, arXiv:1509.02634 / ICCV 2015 (2nd ranked in VOC 2012)

    CentraleSuperBoundaries, INRIA

    Iasonas Kokkinos, Surpassing Humans in Boundary Detection using Deep Learning, arXiv:1411.07386 (4th ranked in VOC 2012)

    BoxSup

    Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640. (6th ranked in VOC2012)

    POSTECH

    Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation, arXiv:1505.04366. (7th ranked in VOC2012)

    Seunghoon Hong, Hyeonwoo Noh, Bohyung Han, Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation, arXiv:1506.04924.

    Seunghoon Hong,Junhyuk Oh,Bohyung Han, andHonglak Lee, Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, arXiv:1512.07928

    Conditional Random Fields as Recurrent Neural Networks

    Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr, Conditional Random Fields as Recurrent Neural Networks, arXiv:1502.03240. (8th ranked in VOC2012)

    DeepLab

    Liang-Chieh Chen, George Papandreou, Kevin Murphy, Alan L. Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image segmentation, arXiv:1502.02734. (9th ranked in VOC2012)

    Zoom-out

    Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015

    Joint Calibration

    Holger Caesar, Jasper Uijlings, Vittorio Ferrari, Joint Calibration for Semantic Segmentation, arXiv:1507.01581.

    Fully Convolutional Networks for Semantic Segmentation

    Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.

    Hypercolumn

    Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for Object Segmentation and Fine-Grained Localization, CVPR, 2015.

    Deep Hierarchical Parsing

    Abhishek Sharma, Oncel Tuzel, David W. Jacobs, Deep Hierarchical Parsing for Semantic Segmentation, CVPR, 2015.

    Learning Hierarchical Features for Scene Labeling

    Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, ICML, 2012.

    Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Learning Hierarchical Features for Scene Labeling, PAMI, 2013.

    University of Cambridge

    Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” arXiv preprint arXiv:1511.00561, 2015.

    Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla “Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding.” arXiv preprint arXiv:1511.02680, 2015.

    Princeton

    Fisher Yu, Vladlen Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions”, ICLR 2016

    Univ. of Washington, Allen AI

    Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi, “Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing”, ICCV, 2015

    INRIA

    Iasonas Kokkinos, “Pusing the Boundaries of Boundary Detection Using deep Learning”, ICLR 2016

    UCSB

    Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, “Weakly supervised graph based semantic segmentation by learning communities of image-parts”, ICCV, 2015

    其他资源

    课程

    深度视觉

    [斯坦福] CS231n: Convolutional Neural Networks for Visual Recognition

    [香港中文大学] ELEG 5040: Advanced Topics in Signal Processing(Introduction to Deep Learning)

    · 更多深度课程推荐

    [斯坦福] CS224d: Deep Learning for Natural Language Processing

    [牛津 Deep Learning by Prof. Nando de Freitas

    [纽约大学] Deep Learning by Prof. Yann LeCun

    图书

    免费在线图书

    Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

    Neural Networks and Deep Learning by Michael Nielsen

    Deep Learning Tutorial by LISA lab, University of Montreal

    视频

    演讲

    Deep Learning, Self-Taught Learning and Unsupervised Feature Learning By Andrew Ng

    Recent Developments in Deep Learning By Geoff Hinton

    The Unreasonable Effectiveness of Deep Learning by Yann LeCun

    Deep Learning of Representations by Yoshua bengio

    软件

    框架
    • Tensorflow: An open source software library for numerical computation using data flow graph by Google [Web]
    • Torch7: Deep learning library in Lua, used by Facebook and Google Deepmind [Web]
    • Torch-based deep learning libraries: [torchnet],
    • Caffe: Deep learning framework by the BVLC [Web]
    • Theano: Mathematical library in Python, maintained by LISA lab [Web]
    • Theano-based deep learning libraries: [Pylearn2], [Blocks], [Keras], [Lasagne]
    • MatConvNet: CNNs for MATLAB [Web]
    • MXNet: A flexible and efficient deep learning library for heterogeneous distributed systems with multi-language support [Web]
    • Deepgaze: A computer vision library for human-computer interaction based on CNNs [Web]

    应用

    • 对抗训练 Code and hyperparameters for the paper “Generative Adversarial Networks” [Web]
    • 理解与可视化 Source code for “Understanding Deep Image Representations by Inverting Them,” CVPR, 2015. [Web]
    • 词义分割 Source code for the paper “Rich feature hierarchies for accurate object detection and semantic segmentation,” CVPR, 2014. [Web] ; Source code for the paper “Fully Convolutional Networks for Semantic Segmentation,” CVPR, 2015. [Web]
    • 超分辨率 Image Super-Resolution for Anime-Style-Art [Web]
    • 边缘检测 Source code for the paper “DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection,” CVPR, 2015. [Web]
    • Source code for the paper “Holistically-Nested Edge Detection”, ICCV 2015. [Web]

    讲座

    • [CVPR 2014] Tutorial on Deep Learning in Computer Vision
    • [CVPR 2015] Applied Deep Learning for Computer Vision with Torch

    博客

    • Deep down the rabbit hole: CVPR 2015 and beyond@Tombone’s Computer Vision Blog
    • CVPR recap and where we’re going@Zoya Bylinskii (MIT PhD Student)’s Blog
    • Facebook’s AI Painting@Wired
    • Inceptionism: Going Deeper into Neural Networks@Google Research
    • Implementing Neural networks
    展开全文
  • 斯坦福计算机视觉实验室04年至今所有论文,共分五份(资源大小限制),对计算机视觉有兴趣的可以下载学习,可用于了解计算机视觉发展体系
  • 计算机视觉最佳论文

    2018-06-10 21:33:53
    计算机视觉最佳论文,计算机视觉论文,CVPR 计算机视觉
  • 斯坦福计算机视觉实验室04年至今所有论文,共分五份(资源大小限制),对计算机视觉有兴趣的可以下载学习,可用于了解计算机视觉发展体系
  • 深度学习中计算机视觉的目标检测的YOLOv1的改进版本
  • 受启发的精选计算机视觉深度学习资源列表-自2017年以来不再维护。 我将此列表作为研究计算机视觉领域的里程碑。 我想告诉你,我已经重写了最新论文和重要论文。 使用情况 Fork&Star这个仓库。 阅读纸张后,将复选框...
  • 计算机视觉CVPR论文

    2018-06-10 21:31:05
    计算机视觉最佳论文,cvpr,入选最佳论文计算机视觉最佳论文
  • ICCV(国际计算机视觉大会)2019论文合集 CCV 的全称是 IEEE International Conference on Computer Vision,即国际计算机视觉大会,由IEEE主办,与计算机视觉模式识别会议(CVPR)和欧洲计算机视觉会议(ECCV)并称...
  • 计算机视觉经典论文整理

    千次阅读 2019-08-24 13:25:09
    计算机视觉论文 ImageNet分类 物体检测 物体跟踪 低级视觉 边缘检测 语义分割 视觉注意力和显著性 物体识别 人体姿态估计 CNN原理和性质(Understanding CNN) 图像和语言 图像解说 视频解说 图像生成...

    经典论文

    计算机视觉论文

    1. ImageNet分类
    2. 物体检测
    3. 物体跟踪
    4. 低级视觉
    5. 边缘检测
    6. 语义分割
    7. 视觉注意力和显著性
    8. 物体识别
    9. 人体姿态估计
    10. CNN原理和性质(Understanding CNN)
    11. 图像和语言
    12. 图像解说
    13. 视频解说
    14. 图像生成

    微软ResNet

    论文:用于图像识别的深度残差网络

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1512.03385v1.pdf

    微软PRelu(随机纠正线性单元/权重初始化)

    论文:深入学习整流器:在ImageNet分类上超越人类水平

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1502.01852.pdf

    谷歌Batch Normalization

    论文:批量归一化:通过减少内部协变量来加速深度网络训练

    作者:Sergey Ioffe, Christian Szegedy

    链接:http://arxiv.org/pdf/1502.03167.pdf

    谷歌GoogLeNet

    论文:更深的卷积,CVPR 2015

    作者:Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

    链接:http://arxiv.org/pdf/1409.4842.pdf

    牛津VGG-Net

    论文:大规模视觉识别中的极深卷积网络,ICLR 2015

    作者:Karen Simonyan & Andrew Zisserman

    链接:http://arxiv.org/pdf/1409.1556.pdf

    AlexNet

    论文:使用深度卷积神经网络进行ImageNet分类

    作者:Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

    链接:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

    物体检测

    这里写图片描述

    PVANET

    论文:用于实时物体检测的深度轻量神经网络(PVANET:Deep but Lightweight Neural Networks for Real-time Object Detection)

    作者:Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park

    链接:http://arxiv.org/pdf/1608.08021

    纽约大学OverFeat

    论文:使用卷积网络进行识别、定位和检测(OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks),ICLR 2014

    作者:Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun

    链接:http://arxiv.org/pdf/1312.6229.pdf

    伯克利R-CNN

    论文:精确物体检测和语义分割的丰富特征层次结构(Rich feature hierarchies for accurate object detection and semantic segmentation),CVPR 2014

    作者:Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

    链接:http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

    微软SPP

    论文:视觉识别深度卷积网络中的空间金字塔池化(Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition),ECCV 2014

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1406.4729.pdf

    微软Fast R-CNN

    论文:Fast R-CNN

    作者:Ross Girshick

    链接:http://arxiv.org/pdf/1504.08083.pdf

    微软Faster R-CNN

    论文:使用RPN走向实时物体检测(Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks)

    作者:任少卿、何恺明、Ross Girshick、孙剑

    链接:http://arxiv.org/pdf/1506.01497.pdf

    牛津大学R-CNN minus R

    论文:R-CNN minus R

    作者:Karel Lenc, Andrea Vedaldi

    链接:http://arxiv.org/pdf/1506.06981.pdf

    端到端行人检测

    论文:密集场景中端到端的行人检测(End-to-end People Detection in Crowded Scenes)

    作者:Russell Stewart, Mykhaylo Andriluka

    链接:http://arxiv.org/pdf/1506.04878.pdf

    实时物体检测

    论文:你只看一次:统一实时物体检测(You Only Look Once: Unified, Real-Time Object Detection)

    作者:Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

    链接:http://arxiv.org/pdf/1506.02640.pdf

    Inside-Outside Net

    论文:使用跳跃池化和RNN在场景中检测物体(Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks)

    作者:Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

    链接:http://arxiv.org/abs/1512.04143.pdf

    微软ResNet

    论文:用于图像识别的深度残差网络

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1512.03385v1.pdf

    R-FCN

    论文:通过区域全卷积网络进行物体识别(R-FCN: Object Detection via Region-based Fully Convolutional Networks)

    作者:代季峰,李益,何恺明,孙剑

    链接:http://arxiv.org/abs/1605.06409

    SSD

    论文:单次多框检测器(SSD: Single Shot MultiBox Detector)

    作者:Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

    链接:http://arxiv.org/pdf/1512.02325v2.pdf

    速度/精度权衡

    论文:现代卷积物体检测器的速度/精度权衡(Speed/accuracy trade-offs for modern convolutional object detectors)

    作者:Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

    链接:http://arxiv.org/pdf/1611.10012v1.pdf

    物体跟踪

    • 论文:用卷积神经网络通过学习可区分的显著性地图实现在线跟踪(Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network)

    作者:Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han

    地址:arXiv:1502.06796.

    • 论文:DeepTrack:通过视觉跟踪的卷积神经网络学习辨别特征表征(DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking)

    作者:Hanxi Li, Yi Li and Fatih Porikli

    发表: BMVC, 2014.

    • 论文:视觉跟踪中,学习深度紧凑图像表示(Learning a Deep Compact Image Representation for Visual Tracking)

    作者:N Wang, DY Yeung

    发表:NIPS, 2013.

    • 论文:视觉跟踪的分层卷积特征(Hierarchical Convolutional Features for Visual Tracking)

    作者:Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang

    发表: ICCV 2015

    • 论文:完全卷积网络的视觉跟踪(Visual Tracking with fully Convolutional Networks)

    作者:Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu,

    发表:ICCV 2015

    • 论文:学习多域卷积神经网络进行视觉跟踪(Learning Multi-Domain Convolutional Neural Networks for Visual Tracking)

    作者:Hyeonseob Namand Bohyung Han

    对象识别(Object Recognition)

    论文:卷积神经网络弱监督学习(Weakly-supervised learning with convolutional neural networks)

    作者:Maxime Oquab,Leon Bottou,Ivan Laptev,Josef Sivic,CVPR,2015

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Oquab_Is_Object_Localization_2015_CVPR_paper.pdf

    FV-CNN

    论文:深度滤波器组用于纹理识别和分割(Deep Filter Banks for Texture Recognition and Segmentation)

    作者:Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Cimpoi_Deep_Filter_Banks_2015_CVPR_paper.pdf

    人体姿态估计(Human Pose Estimation)

    • 论文:使用 Part Affinity Field的实时多人2D姿态估计(Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields)

    作者:Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, CVPR, 2017.

    • 论文:Deepcut:多人姿态估计的联合子集分割和标签(Deepcut: Joint subset partition and labeling for multi person pose estimation)

    作者:Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele, CVPR, 2016.

    • 论文:Convolutional pose machines

    作者:Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh, CVPR, 2016.

    • 论文:人体姿态估计的 Stacked hourglass networks(Stacked hourglass networks for human pose estimation)

    作者:Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV, 2016.

    • 论文:用于视频中人体姿态估计的Flowing convnets(Flowing convnets for human pose estimation in videos)

    作者:Tomas Pfister, James Charles, and Andrew Zisserman, ICCV, 2015.

    • 论文:卷积网络和人类姿态估计图模型的联合训练(Joint training of a convolutional network and a graphical model for human pose estimation)

    作者:Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler, NIPS, 2014.

    理解CNN

    这里写图片描述

    • 论文:通过测量同变性和等价性来理解图像表示(Understanding image representations by measuring their equivariance and equivalence)

    作者:Karel Lenc, Andrea Vedaldi, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf

    • 论文:深度神经网络容易被愚弄:无法识别的图像的高置信度预测(Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images)

    作者:Anh Nguyen, Jason Yosinski, Jeff Clune, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf

    • 论文:通过反演理解深度图像表示(Understanding Deep Image Representations by Inverting Them)

    作者:Aravindh Mahendran, Andrea Vedaldi, CVPR, 2015

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf

    • 论文:深度场景CNN中的对象检测器(Object Detectors Emerge in Deep Scene CNNs)

    作者:Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, ICLR, 2015.

    链接:http://arxiv.org/abs/1412.6856

    • 论文:用卷积网络反演视觉表示(Inverting Visual Representations with Convolutional Networks)

    作者:Alexey Dosovitskiy, Thomas Brox, arXiv, 2015.

    链接:http://arxiv.org/abs/1506.02753

    • 论文:可视化和理解卷积网络(Visualizing and Understanding Convolutional Networks)

    作者:Matthrew Zeiler, Rob Fergus, ECCV, 2014.

    链接:http://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf

    图像与语言

    图像说明(Image Captioning)

    这里写图片描述

    UCLA / Baidu

    用多模型循环神经网络解释图像(Explain Images with Multimodal Recurrent Neural Networks)

    Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille, arXiv:1410.1090

    http://arxiv.org/pdf/1410.1090

    Toronto

    使用多模型神经语言模型统一视觉语义嵌入(Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models)

    Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel, arXiv:1411.2539.

    http://arxiv.org/pdf/1411.2539

    Berkeley

    用于视觉识别和描述的长期循环卷积网络(Long-term Recurrent Convolutional Networks for Visual Recognition and Description)

    Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, arXiv:1411.4389.

    http://arxiv.org/pdf/1411.4389

    Google

    看图写字:神经图像说明生成器(Show and Tell: A Neural Image Caption Generator)

    Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, arXiv:1411.4555.

    http://arxiv.org/pdf/1411.4555

    Stanford

    用于生成图像描述的深度视觉语义对齐(Deep Visual-Semantic Alignments for Generating Image Description)

    Andrej Karpathy, Li Fei-Fei, CVPR, 2015.

    Web:http://cs.stanford.edu/people/karpathy/deepimagesent/

    Paper:http://cs.stanford.edu/people/karpathy/cvpr2015.pdf

    UML / UT

    使用深度循环神经网络将视频转换为自然语言(Translating Videos to Natural Language Using Deep Recurrent Neural Networks)

    Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, NAACL-HLT, 2015.

    http://arxiv.org/pdf/1412.4729

    CMU / Microsoft

    学习图像说明生成的循环视觉表示(Learning a Recurrent Visual Representation for Image Caption Generation)

    Xinlei Chen, C. Lawrence Zitnick, arXiv:1411.5654.

    Xinlei Chen, C. Lawrence Zitnick, Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation, CVPR 2015

    http://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf

    Microsoft

    从图像说明到视觉概念(From Captions to Visual Concepts and Back)

    Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, CVPR, 2015.

    http://arxiv.org/pdf/1411.4952

    Univ. Montreal / Univ. Toronto

    Show, Attend, and Tell:视觉注意力与神经图像标题生成(Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention)

    Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, arXiv:1502.03044 / ICML 2015

    http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf

    Idiap / EPFL / Facebook

    基于短语的图像说明(Phrase-based Image Captioning)

    Remi Lebret, Pedro O. Pinheiro, Ronan Collobert, arXiv:1502.03671 / ICML 2015

    http://arxiv.org/pdf/1502.03671

    UCLA / Baidu

    像孩子一样学习:从图像句子描述快速学习视觉的新概念(Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images)

    Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, arXiv:1504.06692

    http://arxiv.org/pdf/1504.06692

    MS + Berkeley

    探索图像说明的最近邻方法( Exploring Nearest Neighbor Approaches for Image Captioning)

    Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick, arXiv:1505.04467

    http://arxiv.org/pdf/1505.04467.pdf

    图像说明的语言模型(Language Models for Image Captioning: The Quirks and What Works)

    Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell, arXiv:1505.01809

    http://arxiv.org/pdf/1505.01809.pdf

    阿德莱德

    具有中间属性层的图像说明( Image Captioning with an Intermediate Attributes Layer)

    Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick, arXiv:1506.01144

    蒂尔堡

    通过图片学习语言(Learning language through pictures)

    Grzegorz Chrupala, Akos Kadar, Afra Alishahi, arXiv:1506.03694

    蒙特利尔大学

    使用基于注意力的编码器-解码器网络描述多媒体内容(Describing Multimedia Content using Attention-based Encoder-Decoder Networks)

    Kyunghyun Cho, Aaron Courville, Yoshua Bengio, arXiv:1507.01053

    康奈尔

    图像表示和神经图像说明的新领域(Image Representations and New Domains in Neural Image Captioning)

    Jack Hessel, Nicolas Savva, Michael J. Wilber, arXiv:1508.02091

    MS + City Univ. of HongKong

    Learning Query and Image Similarities with Ranking Canonical Correlation Analysis

    Ting Yao, Tao Mei, and Chong-Wah Ngo, ICCV, 2015

    视频字幕(Video Captioning)

    伯克利

    Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015.

    犹他州/ UML / 伯克利

    Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729.

    微软

    Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861.

    犹他州/ UML / 伯克利

    Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence–Video to Text, arXiv:1505.00487.

    蒙特利尔大学/ 舍布鲁克

    Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029

    MPI / 伯克利

    Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698

    多伦多大学 / MIT

    Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724

    蒙特利尔大学

    Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053

    TAU / 美国南加州大学

    Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950.

    图像生成

    卷积/循环网络

    • 论文:Conditional Image Generation with PixelCNN Decoders”

    作者:Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

    • 论文:Learning to Generate Chairs with Convolutional Neural Networks

    作者:Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox

    发表:CVPR, 2015.

    • 论文:DRAW: A Recurrent Neural Network For Image Generation

    作者:Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra

    发表:ICML, 2015.

    对抗网络

    • 论文:生成对抗网络(Generative Adversarial Networks)

    作者:Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

    发表:NIPS, 2014.

    • 论文:使用对抗网络Laplacian Pyramid 的深度生成图像模型(Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks)

    作者:Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

    发表:NIPS, 2015.

    • 论文:生成模型演讲概述 (A note on the evaluation of generative models)

    作者:Lucas Theis, Aäron van den Oord, Matthias Bethge

    发表:ICLR 2016.

    • 论文:变分自动编码深度高斯过程(Variationally Auto-Encoded Deep Gaussian Processes)

    作者:Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence

    发表:ICLR 2016.

    • 论文:用注意力机制从字幕生成图像 (Generating Images from Captions with Attention)

    作者:Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov

    发表: ICLR 2016

    • 论文:分类生成对抗网络的无监督和半监督学习(Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks)

    作者:Jost Tobias Springenberg

    发表:ICLR 2016

    • 论文:用一个对抗检测表征(Censoring Representations with an Adversary)

    作者:Harrison Edwards, Amos Storkey

    发表:ICLR 2016

    • 论文:虚拟对抗训练实现分布式顺滑 (Distributional Smoothing with Virtual Adversarial Training)

    作者:Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii

    发表:ICLR 2016

    • 论文:自然图像流形上的生成视觉操作(Generative Visual Manipulation on the Natural Image Manifold)

    作者:朱俊彦, Philipp Krahenbuhl, Eli Shechtman, and Alexei A. Efros

    发表: ECCV 2016.

    • 论文:深度卷积生成对抗网络的无监督表示学习(Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks)

    作者:Alec Radford, Luke Metz, Soumith Chintala

    发表: ICLR 2016

    问题回答

    这里写图片描述

    弗吉尼亚大学 / 微软研究院

    论文:VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.

    作者:Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh

    MPI / 伯克利

    论文:Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

    作者:Mateusz Malinowski, Marcus Rohrbach, Mario Fritz,

    发布 : arXiv:1505.01121.

    多伦多

    论文: Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

    作者:Mengye Ren, Ryan Kiros, Richard Zemel

    发表: arXiv:1505.02074 / ICML 2015 deep learning workshop.

    百度/ 加州大学洛杉矶分校

    作者:Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, 徐伟

    论文:Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

    发表: arXiv:1505.05612.

    POSTECH(韩国)

    论文:Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

    作者:Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han

    发表: arXiv:1511.05765

    CMU / 微软研究院

    论文:Stacked Attention Networks for Image Question Answering

    作者:Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2015)

    发表: arXiv:1511.02274.

    MetaMind

    论文:Dynamic Memory Networks for Visual and Textual Question Answering

    作者:Xiong, Caiming, Stephen Merity, and Richard Socher

    发表: arXiv:1603.01417 (2016).

    首尔国立大学 + NAVER

    论文:Multimodal Residual Learning for Visual QA

    作者:Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

    发表:arXiv:1606:01455

    UC Berkeley + 索尼

    论文:Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

    作者:Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach

    发表:arXiv:1606.01847

    Postech

    论文:Training Recurrent Answering Units with Joint Loss Minimization for VQA

    作者:Hyeonwoo Noh and Bohyung Han

    发表: arXiv:1606.03647

    首尔国立大学 + NAVER

    论文: Hadamard Product for Low-rank Bilinear Pooling

    作者:Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhan

    发表:arXiv:1610.04325.

    视觉注意力和显著性

    这里写图片描述
    论文:Predicting Eye Fixations using Convolutional Neural Networks

    作者:Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu

    发表:CVPR, 2015.

    学习地标的连续搜索

    作者:Learning a Sequential Search for Landmarks

    论文:Saurabh Singh, Derek Hoiem, David Forsyth

    发表:CVPR, 2015.

    视觉注意力机制实现多物体识别

    论文:Multiple Object Recognition with Visual Attention

    作者:Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu,

    发表:ICLR, 2015.

    视觉注意力机制的循环模型

    作者:Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

    论文:Recurrent Models of Visual Attention

    发表:NIPS, 2014.

    低级视觉

    超分辨率

    • Iterative Image Reconstruction

    Sven Behnke: Learning Iterative Image Reconstruction. IJCAI, 2001.

    Sven Behnke: Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid. International Journal of Computational Intelligence and Applications, vol. 1, no. 4, pp. 427-438, 2001.

    • Super-Resolution (SRCNN)

    Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution, ECCV, 2014.

    Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Image Super-Resolution Using Deep Convolutional Networks, arXiv:1501.00092.

    • Very Deep Super-Resolution

    Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Accurate Image Super-Resolution Using Very Deep Convolutional Networks, arXiv:1511.04587, 2015.

    • Deeply-Recursive Convolutional Network

    Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Deeply-Recursive Convolutional Network for Image Super-Resolution, arXiv:1511.04491, 2015.

    • Casade-Sparse-Coding-Network

    Zhaowen Wang, Ding Liu, Wei Han, Jianchao Yang and Thomas S. Huang, Deep Networks for Image Super-Resolution with Sparse Prior. ICCV, 2015.

    • Perceptual Losses for Super-Resolution

    Justin Johnson, Alexandre Alahi, Li Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, arXiv:1603.08155, 2016.

    • SRGAN

    Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, arXiv:1609.04802v3, 2016.

    其他应用

    Optical Flow (FlowNet)

    Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox, FlowNet: Learning Optical Flow with Convolutional Networks, arXiv:1504.06852.

    Compression Artifacts Reduction

    Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, arXiv:1504.06993.

    Blur Removal

    Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Schölkopf, Learning to Deblur, arXiv:1406.7444

    Jian Sun, Wenfei Cao, Zongben Xu, Jean Ponce, Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal, CVPR, 2015

    Image Deconvolution

    Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia, Deep Convolutional Neural Network for Image Deconvolution, NIPS, 2014.

    Deep Edge-Aware Filter

    Li Xu, Jimmy SJ. Ren, Qiong Yan, Renjie Liao, Jiaya Jia, Deep Edge-Aware Filters, ICML, 2015.

    Computing the Stereo Matching Cost with a Convolutional Neural Network

    Jure Žbontar, Yann LeCun, Computing the Stereo Matching Cost with a Convolutional Neural Network, CVPR, 2015.

    Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV, 2016

    Feature Learning by Inpainting

    Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, Context Encoders: Feature Learning by Inpainting, CVPR, 2016

    边缘检测

    这里写图片描述
    Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection, arXiv:1504.06375.

    DeepEdge

    Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015.

    DeepContour

    Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang, DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection, CVPR, 2015.

    语义分割

    这里写图片描述

    SEC: Seed, Expand and Constrain

    Alexander Kolesnikov, Christoph Lampert, Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation, ECCV, 2016.

    Adelaide

    Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient piecewise training of deep structured models for semantic segmentation, arXiv:1504.01013. (1st ranked in VOC2012)

    Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel, Deeply Learning the Messages in Message Passing Inference, arXiv:1508.02108. (4th ranked in VOC2012)

    Deep Parsing Network (DPN)

    Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang, Semantic Image Segmentation via Deep Parsing Network, arXiv:1509.02634 / ICCV 2015 (2nd ranked in VOC 2012)

    CentraleSuperBoundaries, INRIA

    Iasonas Kokkinos, Surpassing Humans in Boundary Detection using Deep Learning, arXiv:1411.07386 (4th ranked in VOC 2012)

    BoxSup

    Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640. (6th ranked in VOC2012)

    POSTECH

    Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation, arXiv:1505.04366. (7th ranked in VOC2012)

    Seunghoon Hong, Hyeonwoo Noh, Bohyung Han, Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation, arXiv:1506.04924.

    Seunghoon Hong,Junhyuk Oh,Bohyung Han, andHonglak Lee, Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, arXiv:1512.07928

    Conditional Random Fields as Recurrent Neural Networks

    Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr, Conditional Random Fields as Recurrent Neural Networks, arXiv:1502.03240. (8th ranked in VOC2012)

    DeepLab

    Liang-Chieh Chen, George Papandreou, Kevin Murphy, Alan L. Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image segmentation, arXiv:1502.02734. (9th ranked in VOC2012)

    Zoom-out

    Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015

    Joint Calibration

    Holger Caesar, Jasper Uijlings, Vittorio Ferrari, Joint Calibration for Semantic Segmentation, arXiv:1507.01581.

    Fully Convolutional Networks for Semantic Segmentation

    Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.

    Hypercolumn

    Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for Object Segmentation and Fine-Grained Localization, CVPR, 2015.

    Deep Hierarchical Parsing

    Abhishek Sharma, Oncel Tuzel, David W. Jacobs, Deep Hierarchical Parsing for Semantic Segmentation, CVPR, 2015.

    Learning Hierarchical Features for Scene Labeling

    Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, ICML, 2012.

    Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Learning Hierarchical Features for Scene Labeling, PAMI, 2013.

    University of Cambridge

    Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” arXiv preprint arXiv:1511.00561, 2015.

    Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla “Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding.” arXiv preprint arXiv:1511.02680, 2015.

    Princeton

    Fisher Yu, Vladlen Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions”, ICLR 2016

    Univ. of Washington, Allen AI

    Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi, “Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing”, ICCV, 2015

    INRIA

    Iasonas Kokkinos, “Pusing the Boundaries of Boundary Detection Using deep Learning”, ICLR 2016

    UCSB

    Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, “Weakly supervised graph based semantic segmentation by learning communities of image-parts”, ICCV, 2015

    其他资源

    课程

    深度视觉

    [斯坦福] CS231n: Convolutional Neural Networks for Visual Recognition

    [香港中文大学] ELEG 5040: Advanced Topics in Signal Processing(Introduction to Deep Learning)

    · 更多深度课程推荐

    [斯坦福] CS224d: Deep Learning for Natural Language Processing

    [牛津 Deep Learning by Prof. Nando de Freitas

    [纽约大学] Deep Learning by Prof. Yann LeCun

    图书

    免费在线图书

    Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

    Neural Networks and Deep Learning by Michael Nielsen

    Deep Learning Tutorial by LISA lab, University of Montreal

    视频

    演讲

    Deep Learning, Self-Taught Learning and Unsupervised Feature Learning By Andrew Ng

    Recent Developments in Deep Learning By Geoff Hinton

    The Unreasonable Effectiveness of Deep Learning by Yann LeCun

    Deep Learning of Representations by Yoshua bengio

    软件

    框架

    • Tensorflow: An open source software library for numerical computation using data flow graph by Google [Web]
    • Torch7: Deep learning library in Lua, used by Facebook and Google Deepmind [Web]
    • Torch-based deep learning libraries: [torchnet],
    • Caffe: Deep learning framework by the BVLC [Web]
    • Theano: Mathematical library in Python, maintained by LISA lab [Web]
    • Theano-based deep learning libraries: [Pylearn2], [Blocks], [Keras], [Lasagne]
    • MatConvNet: CNNs for MATLAB [Web]
    • MXNet: A flexible and efficient deep learning library for heterogeneous distributed systems with multi-language support [Web]
    • Deepgaze: A computer vision library for human-computer interaction based on CNNs [Web]

    应用

    • 对抗训练 Code and hyperparameters for the paper “Generative Adversarial Networks” [Web]
    • 理解与可视化 Source code for “Understanding Deep Image Representations by Inverting Them,” CVPR, 2015. [Web]
    • 词义分割 Source code for the paper “Rich feature hierarchies for accurate object detection and semantic segmentation,” CVPR, 2014. [Web] ; Source code for the paper “Fully Convolutional Networks for Semantic Segmentation,” CVPR, 2015. [Web]
    • 超分辨率 Image Super-Resolution for Anime-Style-Art [Web]
    • 边缘检测 Source code for the paper “DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection,” CVPR, 2015. [Web]
    • Source code for the paper “Holistically-Nested Edge Detection”, ICCV 2015. [Web]

    讲座

    • [CVPR 2014] Tutorial on Deep Learning in Computer Vision
    • [CVPR 2015] Applied Deep Learning for Computer Vision with Torch

    博客

    • Deep down the rabbit hole: CVPR 2015 and beyond@Tombone’s Computer Vision Blog
    • CVPR recap and where we’re going@Zoya Bylinskii (MIT PhD Student)’s Blog
    • Facebook’s AI Painting@Wired
    • Inceptionism: Going Deeper into Neural Networks@Google Research
    • Implementing Neural networks
    展开全文
  • 论文经验 - 计算机视觉(CV)方向

    千次阅读 2021-10-11 16:03:48
    时间:2021/10/10。本篇博客介绍计算机视觉(CV)方向检索和阅读论文的一些经验。内容包含顶会期刊介绍、检索论文和阅读经验。

    前言

    本篇博客介绍计算机视觉(CV)方向检索和阅读论文的一些经验。内容包含顶会顶刊介绍、检索论文和阅读经验。

    顶会顶刊介绍

    若想更加详细的了解,可以参考 CCF推荐会议/期刊目录,本文仅从以下2个领域对常见的顶会顶刊进行总结:

    顶级会议

    A类

    • CVPR: International Conference on Computer Vision and Pattern Recognition
    • ICCV: International Conference on Computer Vision
    • ICML: International Conference on Machine Learning
    • NIPS: Annual Conference on Neural Information Processing Systems
    • AAAI: AAAI Conference on Artificial Intelligence
    • ACM MM: ACM International Conference on Multimedia
    • SIGGRAPH: ACM SIGGRAPH Annual Conference
    • IJCAI: International Joint Conference on Artificial Intelligence

    B类

    • ECCV: European Conference on Computer Vision

    暂无评级

    • ICLR: International Conference on Learning Representations

    顶级期刊

    A类

    • TPAMI: IEEE Trans on Pattern Analysis and Machine Intelligence
    • IJCV: International Journal of Computer Vision
    • TIP: IEEE Transactions on Image Processing

    B类

    • TNNLS: IEEE Transactions on Neural Networks and learning systems
    • Pattern Recognition

    检索论文

    顶会期刊官网

    1. IEEE:IEEE Xplore
    2. CVPR 2020:CVPR 2020 Open Access Repository
    3. dblp_CVPR:CVPR
    4. dblp_ICCV:ICCV
    5. dblp_ECCV:ECCV
    6. And so on

    综合数据库

    1. dblp:dblp: computer science bibliography
    2. arXiv:arXiv.org e-Print archive
    3. 百度学术:百度学术
    4. Google学术:Google 学术搜索
    5. 中国知网:中国知网
    6. 万方数据库:万方数据知识服务平台

    特殊检索

    1. Paper with code:The latest in Machine Learning | Papers With Code
    2. Paper without code:Papers without code - where unreproducible papers come to live

    GitHub

    带有Awesome字样的GitHub仓库,一般会整理出对应方向的优秀论文。

    例如:
    目标检测方向:GitHub - amusi/awesome-object-detection
    语义分割方向:GitHub - mrgloom/awesome-semantic-segmentation
    GAN方向:GitHub - kozistr/Awesome-GANs
    图像上色方向:GitHub - MarkMoHR/Awesome-Image-Colorization

    公众号

    1. CVer
    2. 计算机视觉life
    3. 机器学习与生成对抗网络
    4. And so on

    搜索步骤

    中文文献
    在中国知网、万方数据库、百度学术、谷歌学术等支持中文文献的检索数据库进行搜索。

    外文文献
    在对应方向的顶会期刊中搜索(例如计算机视觉方向的3大顶会),或在dblp和arXiv等综合数据库搜索。

    阅读经验

    注意事项

    1. 一开始看论文时不要追求速度,要尽量把论文中的所有知识点搞懂,打好基础,达到触类旁通的效果。
    2. 网上的博客可以辅助理解论文,但不能只看博客不看论文。
    3. 要给自己消化论文的时间,不能盲目的一直读论文,论文读累了可以考虑做其它事情,比如看代码、做实验、做项目,这样吸收效果更好。

    论文命名方式

    会议/期刊_年份_名称.pdf

    会议/期刊_年份_作者_名称.pdf

    论文内容(总结)

    • 方法简介:总体框架
    • 论文信息:标题、收录、时间、作者
    • 摘要:动机,贡献,方法,实验
    • 动机:困难与挑战,解决了什么问题
    • 贡献:贡献点,效果
    • 方法:总分格式,总体框架,详细介绍
    • 实验:定量实验,定性实验,消融实验
    • 结论:贡献,优点,不足,为何要读,对自己有何帮助,对他人有何借鉴意义

    论文代码

    • 如果论文作者有代码并开源的话,一般在论文的AbstractIntroduction里会提到(可能在注释中)。
    • 如果论文作者没有代码,第三方有代码的话,一般在github上可以搜到。
    • 在paper with code网站上搜索,也可以知道某篇论文有没有对应的代码。

    理解知识点

    要搞清楚知识点的5W:What,Why,How,When,Where。
    可以先搞清楚比较基础的3W:What,Why,How;再搞清楚2W:When,Where。

    • What:定义,即含义。包括理论含义和抽象含义。
    • Why:为什么,即动机。解决了什么问题。
    • How:怎样做的,即流程。解决问题的过程。
    • When:时间,即什么时候使用。
    • Where:位置,即用在哪里。

    理解公式

    • 理解公式时,先搞懂这个公式的作用,然后摸清各个符合代表什么,根据后面的解释进行理解。
    • 如果不行,上网搜相关博客进行理解。
    • 再不行看参考文献和参考文献的参考文献。
    • 再不行请教学长学姐。

    解释公式

    1. 作用
    2. 符号
    3. 原理

    20分钟速读英文文献

    前提:有道词典取词划词插件。


    5+5+10分钟:

    5分钟摘要,动机+贡献。

    5分钟整体网络架构。

    10分钟细节(方法细节+loss函数)。

    工具

    • Adobe Acrobat:专业级PDF文档查看器/编辑器,可以组织页面,编辑页面,高亮,标注,导出为docx格式。
    • Typroa:Markdown语法编辑器,支持LaTeX数学公式,支持代码块高亮,可做笔记。
    • 有道词典:包含取词和划词插件,设置开机自启。
    • 思维导图:按分类记录自己已读论文、在读论文和要读论文。

    笔记

    • 论文总结
    • 基础知识
    • 英文专属名词
    展开全文
  • 计算机视觉论文中benchmark和baseline的区别

    千次阅读 多人点赞 2020-04-25 21:01:37
    总结 benchmark一般是和同行中... 来源: 在计算机视觉论文中benchmark和baseline的区别,以及两者差别多大可以算作较为显著? - Anonymous的回答 - 知乎 https://www.zhihu.com/question/28823373/answer/101504099
  • CVPR 2019 全部论文开源源码汇总 CVPR2019 最佳论文:A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruction CVPR2019 最佳学生论文:Reinforced Cross-Modal Matching & Self-Supervised Imitation ...
  • 日前,计算机视觉三大顶会之一CVPR2020接收结果已经公布,一共有1470篇论文被接收,接收率为22%,相比去年降低3个百分点,竞争越来越激烈。这里和大家分享整理的论文和代码资源,文末有资源的打包下载链接。 目录一...
  • GluonCV 提供了计算机视觉领域顶级深度学习算法的实现。设计上,GluonCV 是为了帮助工程师、研究人员、学生快速的做出产品原型、验证新思路、学习计算机视觉
  • 计算机视觉经典论文

    2018-01-29 11:31:22
    计算机视觉:alexnet ,vgg ,resnet ,rcnn ,faster-rcnn mask-rcnn paper
  • 信息科学与技术学院 计算机视觉 机器视觉 图像工程
  • Aminer, 有做好的分类,也有顶会论文的推荐,AMiner是由清华大学计算机科学与技术系教授唐杰率领团队建立的,具有完全自主知识产权的新一代科技情报分析与挖掘平台。 计算机主题论文阅读推荐-必读论文 - ...
  • 记录每天整理的计算机视觉/深度学习/机器学习相关方向的论文

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 78,809
精华内容 31,523
关键字:

计算机视觉论文

友情链接: ibnotify_app.zip