• https://handong1587.github.io/deep_learning/2015/10/09/object-detection.html#r-cnn
    展开全文
  • 下面是前端时间搜集整理的一些和计算机视觉、模式识别的资源,拿出来与大家分享下。以后,我将把图像处理真正的作为我的兴趣来玩玩了,也许不把研究作为谋生的手段,会更好些。   标题 作者 ...

    原文在这里:http://www.cnblogs.com/scnucs/archive/2012/04/18/2455406.html

    下面是前端时间搜集整理的一些和计算机视觉、模式识别的资源,拿出来与大家分享下。以后,我将把图像处理真正的作为我的兴趣来玩玩了,也许不把研究作为谋生的手段,会更好些。

     

    标题

    作者

    主题

    关键字

    类别

    来源

    备注

    nipsfast.ppt

    Nando de Freitas

    N-Body problems in learning

    Fast N-Body Learning

    Ppt

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    nipsfgtf.ppt

    Ramani Duraiswami

    Fast Multipole Methods Fast Gaussian Transform

    FM and FGT

    ppt

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    Gray.pdf/ppt

    Alex Gray

    Statistical N-Body/Proximity Data Structures

    N-Body and Data Structures

    Ppt/pdf

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    dt-nips04.pdf/ppt

    Dan Huttenlocher

    Fast Distance Transforms

    FDT

    Ppt/pdf

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    High.pdf/ppt

    Alexander Gray

    Fast high-dimensional function integration

    Fast integration

    Ppt/pdf

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    Fast04.pdf/ppt

    David Lowe

    Fast high-dimensional feature indexing for object recognition

    Feature indexing

    Ppt/pdf

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    ihler-fast.pdf/ppt

    Alexander lhler

    Fast methods and non-parametric BP

    Non-parametric BP

    Ppt/pdf

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    fastview.pdf

    Dustin Lang

    Comparing fast methods

    Overview fast methods

    pdf

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    nbody_methods.tar.gz

     

     

     

    code

    http://www.cs.ubc.ca/~awll/nbody_methods.html

     

    demo_rbpf_gauss.tar

     

    Rao Blackwellised particle filtering for conditionally Gaussian Models

    particle filtering for conditionally

    code

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    demorbpfdbn.tar.gz

     

    Rao Blackwellised Particle Filtering

     

    code

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

    http://www.cs.ubc.ca/~nando/software.html

     

    upf_demos.tar.gz

     

    Unscented Particle Filter

    Particle Filter

    code

    http://www.cs.ubc.ca/~nando/nipsfast/schedule.html

     

    BPF_1_3.zip

     

    Boosted Particle Filter

    Tracking

    code

    http://www.cs.ubc.ca/~okumak/research.html

    1

    flyer_14_800.mpg

     

    Source image

    Database

    Image

    http://www.cs.ubc.ca/~okumak/research.html

    1

    trans_flyer_14_800.mpg

     

    image transformed

    Database

    Image

    http://www.cs.ubc.ca/~okumak/research.html

    1

    LBP.c/h

    Topi Mäenpää

    LBP operator

    Texture

    code

    http://www.ee.oulu.fi/~topiolli/cpplibs/files/

     

    calibr_v30.zip

     

    Camera Calibration

    Computer vision

    code

    http://www.ee.oulu.fi/mvg/page/camera_calibration

    _toolbox_for_matlab

    2

     

    LEAR(Learning and Recognition in Vision

    Common dataset

    Human/car horse soccer human actions

    dataset

    http://lear.inrialpes.fr/data

    3

    Lic.zip/highlight.zip

    Robby T. Tan

    Color Constancy Through Inverse Intensity Chromaticity Space

    Highlight Removal from single image

    code

    http://www.commsp.ee.ic.ac.uk/~rtan/

     

    2008_oxford_fog.pdf

    Robby T. Tan

    Defog

    Defog from single

    pdf

    http://www.commsp.ee.ic.ac.uk/~rtan/

     

    08_cvpr.pdf

    Robby T. Tan

    Defog

    Defog from single

    pdf

    http://www.commsp.ee.ic.ac.uk/~rtan/

     

    Retinex_frankle_mccann

     

    Retinex

     

    Code

    http://www.cs.sfu.ca/~colour/publications/IST-2000/

    Some

    Retinex_maccann99

     

    Retinex

     

    code

    http://www.cs.sfu.ca/~colour/publications/IST-2000/

    pictures

    Gamut.tar.bz2

     

    Retinex

     

    code

    http://kobus.ca/research/programs/colour_constancy/index.html

     

    Video.avi/dehaze.m

     

    dehazing

    Raanan Fattal

    code

    http://www.cs.huji.ac.il/~raananf/projects/defog/index.html

     

    MPTK-Windows-bin-0-5-6-beta.zip

    Matching pursuit(MP)

    Alogrithm

           CNRS

    Code

    http://mptk.irisa.fr/downloads

     

    generateDictionaries.txt

    GenerateGabor

    Alogrithm

     

    code

    http://www.scholarpedia.org/article/Matching_pursuit

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Notes:

    1.      视频和源码都是对应的文章的:

    Kenji Okuma, Ali Taleghani, Nando De Freitas, Jim Little, David G. Lowe. Boosted Particle Filter: Multitarget Detection and Tracking. the European Conference on Computer Vision(ECCV), May 2004.

    2.      该网站下面还有其他一些资源可以下载:

    http://www.ee.oulu.fi/mvg/page/downloads

    是个研究组织:http://lear.inrialpes.fr/  除此之外,还有一些源码。

    计算机视觉文献与代码资源

    CVonline

    http://homepages.inf.ed.ac.uk/rbf/CVonline

    http://homepages.inf.ed.ac.uk/rbf/CVonline/unfolded.htm

    http://homepages.inf.ed.ac.uk/rbf/CVonline/CVentry.htm

     

    李子青的大作:

    Markov Random Field Modeling in Computer Vision

    http://www.cbsr.ia.ac.cn/users/szli/mrf_book/book.html

    Handbook of Face Recognition (PDF)


    http://www.umiacs.umd.edu/~shaohua/papers/zhou04hfr.pdf

     

     

     

    张正友的有关参数鲁棒估计著作:


    Parameter Estimation Techniques:A Tutorial with Application to Conic Fitting


    http://research.microsoft.com/~zhang/INRIA/Publis/Tutorial-Estim/Main.html

     


    Andrea Fusiello“计算机视觉中的几何”教程:Elements of Geometric Computer Vision

    http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/FUSIELLO4/tutorial.html#x1-520007

     

     

    有关马尔可夫蒙特卡罗方法的资料:


    An introduction to Markov chain Monte Carlo

    http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/SENEGAS/mcmc.html


    Markov Chain Monte Carlo for Computer Vision--- A tutorial at ICCV05

           http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm

     

    有关独立成分分析(Independent Component Analysis , ICA)的资料:

    An ICA-Page

    http://www.cnl.salk.edu/~tony/ica.html

    Fast ICA

    http://www.cis.hut.fi/projects/ica/fastica/

     

           The Kalman Filter (介绍卡尔曼滤波器的终极网页)
          http://www.cs.unc.edu/~welch/kalman/index.html

     

    Cached k-d tree search for ICP algorithms

    http://kos.informatik.uni-osnabrueck.de/download/3dim2007/paper.html

     

     


    几个计算机视觉研究工具


    Machine Vision Toolbox for Matlab

    http://www.petercorke.com/Machine%20Vision%20Toolbox.html

     


    Matlab and Octave Function for Computer Vision and Image Processing

    http://www.csse.uwa.edu.au/~pk/research/matlabfns/

     

    Bayes Net Toolbox for Matlab

    http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

     


    OpenCV (Chinese)

    http://www.opencv.org.cn/index.php/%E9%A6%96%E9%A1%B5

     

    Gandalf (A Computer Vision and Numerical Algorithm Labrary)

    http://gandalf-library.sourceforge.net/

     

    CMU Computer Vision Home Page

    http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/vision.html

     

    Machine Learning Resource Links

    http://www.cse.ust.hk/~ivor/resource.htm

     

    The Bayesian Filtering Library

    http://www.orocos.org/bfl

     

    Optical Flow Algorithm Evaluation (提供了一个动态贝叶斯网络框架,例如递归信息处理与分析、卡尔曼滤波、粒子滤波、序列蒙特卡罗方法等,C++写的)

    http://of-eval.sourceforge.net/

     

    MATLAB code for ICP algorithm

    http://www.usenet.com/newsgroups/comp.graphics.visualization/msg00102.html

     

    牛人主页:

    朱松纯(Song-Chun Zhu)

    http://www.stat.ucla.edu/~sczhu/

     

    David Lowe (SIFT) (很帅的一个老头哦 ^ ^)

    http://www.cs.ubc.ca/~lowe/

     

    Andrea Vedaldi (SIFT)

    http://vision.ucla.edu/~vedaldi/index.html

     

    Pedro F. Felzenszwalb

    http://people.cs.uchicago.edu/~pff/

     

    Dougla Dlanman (Brown的一个研究生,在其主页上搜集了大量算法教程和源码)

    http://mesh.brown.edu/dlanman/courses.html

     

    Jianbo Shi (Ncuts 的始作俑者)

    http://www.cis.upenn.edu/~jshi/


     

    Active Vision Group (Oxford的一个机器视觉研究团队,特色是SLAM,监视,导航)

    http://www.robots.ox.ac.uk/ActiveVision/index.html

     

    Juyang Weng(机器学习的专家,Autonomous Mental Development 是其特色)

    http://www.cse.msu.edu/~weng/

    测试图片或视频:

    Middlebury College‘s Stereo Vision Data Set


    http://cat.middlebury.edu/stereo/data.html

     

     

    Intelligent Vehicle:

    IVSource

    www.ivsoruce.net

    Robot Car

    http://www.plyojump.com/robot_cars.html

    How to Build a Robot: The Computer Vision Part

    http://www.societyofrobots.com/programming_computer_vision_tutorial.shtml

    计算机视觉应关注的资源

    来自美国帝腾大学的链接。

    Camera Calibration Links to toolboxes (mostly MATLAB) for camera calibration.

    Paul Debevec. Modeling and Rendering Architecture from Photographs.

    Marc PollefeysTutorial on 3D Modeling from Images,, ECCV 2000,
     

    Available here: notes (12.1MB pdf)

    Richard Szeliski NIPS 2004 Tutorial on Acquiring Detailed 3D Models From Images and Video,
     

    Available here: slides (37.6 MB, ppt)

    Peter Corke did his thesis work on visual servoing for robot applications and has authored a robotics toolkit and vision toolkit for MATLAB.
     

    local copy of thesis: Corke thesis (4.36 MB, pdf)
    robot toolkit: robot.zip (568 KB, zip)
    vision toolkit: mv.zip (1.08 MB, zip)

    P. D. Kovesi.MATLAB Functions for Computer Vision and Image Analysis.
    School of Computer Science & Software Engineering, The University of Western Australia.
    Available locally as a zip archive MatlabFns.zip (4.8 MB, updated 21 May 2005)

    Philip Torr, among many other contributions, submitted a Structure and motion toolkit in Matlab to the MathSoft File Exhange.
    Local copy here: torrsam.zip (2.4 MB, zip).

     

     



    http://blog.sciencenet.cn/home.php?mod=space&uid=454498&do=blog&id=456240


    展开全文
  • 计算机视觉如何入门

    2017-12-25 09:37:46
    这里有你要入门计算机视觉,需要了解的一些基础知识、参考书籍、公开课。 当前计算机视觉作为人工智能的一个分支,它不可避免的要跟深度学习做结合,而深度学习也可以说是融合到了计算机视觉、图像处理,包括我们说...

    以下内容整理自 2017 年 6 月 29 日由“趣直播–知识直播平台”邀请的嘉宾实录。
    分享嘉宾: 罗韵

    目前,人工智能,机器学习,深度学习,计算机视觉等已经成为新时代的风向标。这篇文章主要介绍了下面几点:
    第一点,如果说你要入门计算机视觉,需要了解哪一些基础知识?

    第二点,既然你要往这方面学习,你要了解的参考书籍,可以学习的一些公开课有哪些?

    第三点,可能是大家都比较感兴趣的,就是计算机视觉作为人工智能的一个分支,它不可避免的要跟深度学习做结合,而深度学习也可以说是融合到了计算机视觉、图像处理,包括我们说的自然语言处理,所以本文也会简单介绍一下计算机视觉与深度学习的结合。

    第四点,身处计算机领域,我们不可避免的会去做开源的工作,所以本文会给大家介绍一些开源的软件。

    第五点,要学习或者研究计算机视觉,肯定是需要去阅读一些文献的,那么我们如何开始阅读文献,以及慢慢的找到自己在这个领域的方向,这些都会在本文理进行简单的介绍。

    1.基础知识

    接下来要介绍的,第一点是计算机视觉是什么意思,其次是图像、视频的一些基础知识。包括摄像机的硬件,以及 CPU 和 GPU 的运算。
    在计算机视觉里面,我们也不可避免的会涉及到考虑去使用 CPU 还是使用 GPU 去做运算。然后就是它跟其他学科的交叉,因为计算机视觉可以和很多的学科做交叉,而且在做学科交叉的时候,能够发挥的意义和使用价值也会更大。另外,对于以前并不是做人工智能的朋友,可能是做软件开发的,想去转型做计算机视觉,该如何转型?需要学习哪些编程语言以及数学基础?这些都会在第一小节给大家介绍。

    1.0 什么是计算机视觉

    计算机视觉是一门研究如何使机器“看”的科学。
    更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给一起检测的图像
    作为一个科学学科,计算机视觉研究相关的理论和技术,视图建立能够从图像或者多维数据中获取“信息”的人工智能系统。
    目前,非常火的VR、AR,3D处理等方向,都是计算机视觉的一部分。
    计算机视觉的应用

    • 无人驾驶
    • 无人安防
    • 人脸识别
    • 车辆车牌识别
    • 以图搜图
    • VR/AR
    • 3D重构
    • 医学图像分析
    • 无人机
    • 其他

    了解了计算机视觉是什么之后,给大家列了一下当前计算机视觉领域的一些应用,几乎可以说是无处不在,而且当前最火的所有创业的方向都涵盖在里面了。其中包括我们经常提到的无人驾驶、无人安防、人脸识别。人脸识别相对来说已经是一个最成熟的应用领域了,然后还有文字识别、车辆车牌识别,还有以图搜图、 VR/AR,还包括 3D 重构,以及当下很有前景的领域–医学图像分析。
    医学图像分析他在很早就被提出来了,已经研究了很久,但是现在得到了一个重新的发展,更多的研究人员包括无论是做图像的研究人员,还是本身就在医疗领域的研究人员,都越来越关注计算机视觉、人工智能跟医学图像的分析。而且在当下,医学图像分析也孕育了不少的创业公司,这个方向的未来前景还是很值得期待的。然后除此之外还包括无人机,无人驾驶等,都应用到了计算机视觉的技术。

    1.1图像和视频,你要知道的概念

    • 图像
      一张图片包含了:维数、高度、宽度、深度、通道数、颜色格式、数据首地址、结束地址、数据量等等。
      • 图像深度:存储每个像素所用的位数(bits)
        • 当一个像素占用的位数越多时,它所能表现的颜色就更多,更丰富。
        • 举例:一张400*400的8位图,这张图的原始数据量是多少?像素值如果是整型的话,取值范围是多少?
          1,原始数据量计算:400 * 400 * ( 8/8 )=160,000Bytes
          (约为160K)
          2,取值范围:2的8次方,0~255
      • 图片格式与压缩:常见的图片格式JPEG,PNG,BMP等本质上都是图片的一种压缩编码方式
        • 举例:JPEG压缩
          1,将原始图像分为8*8的小块,每个block里有64pixels。
          2,将图像中每个8*8的block进行DCT变换(越是复杂的图像,越不容易被压缩)
          3,不同的图像被分割后,每个小块的复杂度不一样,所以最终的压缩结果也不一样
    • 视频
      原始视频=图片序列。
      视频中的每张有序图片称为“帧(frame)”。压缩后的视频,会采取各种算法减少数据的容量,其中IPB就是最常见的。
      • I帧:表示关键帧,可以理解为这一幅画面的完整保留;解码时只需要本帧数据就可以完成(因为包含完整画面)
      • P帧:表示的是这一帧跟之前的一个关键帧(或P帧)的差别,解码时需要用之前缓存的画面叠加上本帧定义的差别,生成最终画面。(也就是差别帧,P帧没有完整画面数据,只有与前一帧画面差别的数据)
      • B帧表示双向差别帧,记录的本帧与前后帧的差别(具体比较复杂,有4种情况),换言之,要解码B帧,不仅要取得之前的缓存画面,还要解码之后的画面,要通过前后画面与本帧数据的叠加取得最终的画面。B帧压缩率高,但是解码比较麻烦。
      • 码率:码率越大,体积越大;码率越小,体积越小。
        码率就是数据传输时单位时间传送的数据位数,一般我们用的单位是kbps即千位每秒。也就是取样率(并不等同于采样率,采样率用的单位是Hz,表示每秒采样的次数),单位时间内取样率越大,精度就越高,处理出来的文件就越接近原始文件,但是文件体积与取样率是成正比的,所以几乎所有的编码格式重视的都是如何用最低的码率达到最少的失真,围绕这个核心衍生出来cbr(固定码率)与vbr(可变码率),码率越高越清晰,反之则画面粗糙而且多马赛克。
      • 帧率
        影响画面流畅度,与画面流畅度成正比:帧率越大,画面越流畅;帧率越小,画面越有跳动感。如果码率为变量,则帧率也会影响体积,帧率越高,每秒钟经过的画面就越多,需要的码率也越高,体积也越大。
        帧率就是在一秒钟时间里传输的图片的帧数,也可以理解为图形处理器每秒钟刷新的次数。
      • 分辨率
      • 影响图像大小,与图像大小成正比;分辨率越高,图像越大;分辨率越低,图像越小。
      • 清晰度
        在码率一定的情况下,分辨率与清晰度成反比关系:分辨率越高,图像越不清晰,分辨率越低,图像越清晰
        在分辨率一定的情况下,码率与清晰度成正比关系:码率越高,图像越清晰;码率越低,图像越不清晰
      • 带宽、帧率
        例如在ADSL线路上传输图像,上行带宽只有512Kbps,但要传输4路CIF分辨率的图像。按照常规,CIF分辨率建议码率是512Kbps,那么照此计算就只能传一路,降低码率势必会影响图像质量。那么为了确保图像质量,就必须降低帧率,这样一来,即便降低码率也不会影响图像质量,但在图像的连贯性上会有影响。

    1.2摄像机

    摄像机的分类:

    • 监控摄像机(网络摄像机和摸你摄像机)
    • 不同行业需求的摄像机(超宽动态摄像机、红外摄像机、热成像摄像机等)
    • 智能摄像机
    • 工业摄像机

    当前的摄像机硬件我们可以分为监控摄像机、专业行业应用的摄像机、智能摄像机和工业摄像机。而在监控摄像机里面,当前用的比较多的两个类型一个叫做网络摄像机,一个叫做模拟摄相机,他们主要是成像的原理不太一样。
    网络摄像机一般比传统模拟摄相机的清晰度要高一些,模拟摄像机当前应该说是慢慢处于一个淘汰的状态,它可以理解为是上一代的监控摄像机,而网络摄像机是当前的一个主流的摄相机,大概在 13 年的时候,可能市场上 70% 到 80% 多都是模拟摄像机,而现在可能 60% 到 70% 都是的网络摄像机。
    除此之外,不同的行业其时会有特定的相机,想超宽动态摄像机以及红外摄像机、热成像摄像机,都是在专用的特定的领域里面可能用到的,而且他获得的画面跟图像是完全不一样的。如果我们要做图像处理跟计算机视觉分析,什么样的相机对你更有利,我们要学会利用硬件的优势。
    如果是做研究的话一般是可以控制我们用什么样的摄相机,但如果是在实际的应用场景,这个把控的可能性会稍微小一点,但是在这里你要知道,有些问题可能你换一种硬件,它就能够很好的被解决,这是一个思路。
    还有些问题你可能用算法弄了很久也没能解决,甚至是你的效率非常差,成本非常高,但是稍稍换一换硬件,你会发现原来的问题都不存在了,都被很好的解决了,这个就是硬件对你的一个新的处境了。
    包括现在还有智能摄像机、工业摄像机,工业摄像机一般的价格也会比较贵,因为他专用于各种工业领域,或者是做一些精密仪器,高精度高清晰度要求的摄像机。

    1.3 CPU和GPU

    接下来给大家讲一下 CPU 跟 GPU,如果说你要做计算机视觉跟图像处理,那么肯定跳不过 GPU 运算,GPU 运算这一块可能也是接下来需要学习或者自学的一个知识点。
    因为可以看到,当前大部分关于计算机视觉的论文,很多实现起来都是用 GPU 去实现的,但是在应用领域,因为 GPU 的价格比较昂贵,所以 CPU 的应用场景相对来说还是占大部分。
    而 CPU 跟 GPU 的差别主要在哪里呢? 它们的差别主要可以在两个方面去对比,第一个叫性能,第二个叫做吞吐量。
    性能,换言之,性能会换成另外一个单词叫做 Latency(低延时性)。低延时性就是当你的性能越好,你处理分析的效率越高,相当于你的延时性就越低,这个是性能。另外一个叫做吞吐量,吞吐量的意思就是你同时能够处理的数据量。
    而 CPU 跟 GPU 的差别在哪里呢?主要就在于这两个地方,CPU 它是一个高性能,就是超低延时性的,他能够快速的去做复杂运算,并且能达到一个很好的性能要求。而 GPU是以一个叫做运算单元为格式的,所以他的优点不在于低延时性,因为他确实不善于做复杂运算,他每一个处理器都非常的小,相对来说会很弱,但是它可以让它所有的弱处理器,同时去做处理,那相当于他就能够同时处理大量的数据,那这个就意味着它的吞吐量非常大,所以 CPU重视的是性能,GPU重视的是吞吐量。
    所以大部分时候,GPU 他会跟另外一个词语联系在一起,叫做并行计算,意思就是它可以同时做大量的线程运算,为什么图像会特别适合用 GPU 运算呢?这是因为 GPU 它最开始的设计就是叫做图形处理单元,它的意思就是我可以把每一个像素,分割为一个线程去运算,每一个像素只做一些简单的运算,这个就是最开始图形处理器出现的原理。
    它要做图形渲染的时候,要计算的是每一个像素的变换。所以每一个像素变换的计算量是很小很小的,可能就是一个公式的计算,计算量很少,它可以放在一个简单的计算单元里面去做计算,那这个就是 CPU 跟 GPU 的差别。
    基于这样的差别,我们才会去设计什么时候用 CPU,什么时候用 GPU。如果你当前设计的算法,它的并行能力不是很强,从头到尾从上到下都是一个复杂的计算,没有太多可并性的地方,那么即使你用了 GPU,也不能帮助你很好提升计算性能。

    所以,不要说别人都在用 GPU 那你就用 GPU,我们要了解的是为什么要用 GPU ,以及什么样的情况下用 GPU,它效果能够发挥出来最好。

    1.4计算机视觉与其他学科的关系

    计算机视觉目前跟其他学科的关系非常的多,包括机器人,以及刚才提到的医疗、物理、图像、卫星图片的处理,这些都会经常使用到计算机视觉,那这里呢,最常问到的问题无非就是有三个概念,一个叫做计算机视觉,一个叫做机器视觉,一个叫做图像处理,那这三个东西有什么区别呢?
    这三个东西的区别还是挺因人而异的,每一个研究人员对它的理解都不一样。
    首先,Image Processing更多的是图形图像的一些处理,图像像素级别的一些处理,包括 3D 的处理,更多的会理解为是一个图像的处理;而机器视觉呢,更多的是它还结合到了硬件层面的处理,就是软硬件结合的图形计算的能力,跟图形智能化的能力,我们一般会理解为他就是所谓的机器视觉。
    而我们今天所说的计算机视觉,更多的是偏向于软件层面的计算机处理,而且不是说做图像的识别这么简单,更多的还包括了对图像的理解,甚至是对图像的一些变换处理,当前我们涉及到的一些图像的生成,也是可以归类到这个计算机视觉领域里面的。
    所以说计算机视觉它本身的也是一个很基础的学科,可以跟各个学科做交叉,同时,它自己内部也会分的比较细,包括机器视觉、图像处理。

    1.5 编程语言AND数学基础

    这一部分的内容可以参见《非计算机专业,如何学习计算机视觉

    2.参考书籍和公开课

    参考书
    第一本叫《Computer Vision:Models, Learning and Inference》written by Simon J.D. prince,这个主要讲的更适合入门级别的,因为这本书里面配套了非常多的代码,Matlab 代码,C 的代码都有,配套了非常多的学习代码,以及参考资料、文献,都配得非常详细,所以它很适合入门级别的同学去看。

    第二本《Computer Vision:Algorithms and Applications》written by Richard Szeliski,这是一本非常经典,非常权威的参考资料,这本书不是用来看的,是用来查的,类似于一本工具书,它是涵盖面最广的一本参考书籍,所以一般会可以当成工具书去看,去查阅。

    第三本《OpenCV3编程入门》作者:毛星云,冷雪飞 ,如果想快速的上手去实现一些项目,可以看看这本书,它可以教你动手实现一些例子,并且学习到 OpenCV 最经典、最广泛的计算机视觉开源库。

    公开课:
    Stanford CS223B
    比较适合基础,适合刚刚入门的同学,跟深度学习的结合相对来说会少一点,不会整门课讲深度学习,而是主要讲计算机视觉,方方面面都会讲到。

    Stanford CS231N
    这个应该不用介绍了,一般很多人都知道,这个是计算机视觉和深度学习结合的一门课,我们上 YouTube 就能够看到,这门课的授课老师就是李飞飞老师,如果说不知道的话可以查一下,做计算机视觉的话,此人算是业界和学术界的“执牛耳”了。

    3.需要了解的深度学习知识

    深度学习没有太多的要讲的,不是说内容不多,是非常多,这里只推荐一本书给大家,这本书是去年年底才出的,是最新的一本深度学习的书,它讲得非常全面,从基础的数学,到刚才说的概率学、统计学、机器学习以及微积分、线性几何的知识点,非常的全面。
    这里写图片描述

    4.需要了解和学习的开源软件

    OpenCV
    它是一个很经典的计算机视觉库,实现了很多计算机视觉的常用算法。可以帮助大家快速上手。
    Caffe
    如果是做计算机视觉的话,比较建议 Caffe。Caffe 更擅长做的是卷积神经网络,卷积神经网络在计算机视觉里面用的是最多的。
    所以无论你后面学什么样其它的开源软件, Caffe 是必不可免的,因为学完 Caffe 之后你会发现,如果你理解了 Caffe,会用 Caffe,甚至是有能力去改它的源代码,你就会发现你对深度学习有了一个质的飞跃的理解。
    TensorFlow
    TensorFlow 最近很火,但是它的入门门槛不低,你要学会使用它需要的时间远比其他所有的软件都要多,其次就是它当前还不是特别的成熟稳定,所以版本之间的更新迭代非常的多,兼容性并不好,运行效率还有非常大的提升空间。

    5.如何阅读相关的文献

    先熟悉所在方向的发展历程,然后精读历程中的里程碑式的文献。
    例如:深度学习做目标检测,RCNN,Fast RCNN,Faster RCNN,SPPNET,SSD和YOLO这些模型肯定是要知道的。又例如,深度学习做目标跟踪,DLT,SO-DLT等。

    计算机视觉的顶会:
    ICCV:International Conference on Computer Vision,国际计算机视觉大会
    CVPR:International Conference on Computer Vision and Pattern Recognition,国际计算机视觉与模式识别大会
    ECCV:European Conference on Computer Vision,欧洲计算机视觉大会
    除了顶会之外呢,还有顶刊。像 PAMI、IJCV,这些都是顶刊,它代表着这个领域里面最尖端最前沿以及当下的研究方向。

    展开全文
  • 计算机视觉论文整理

    2018-05-30 10:19:42
    本文梳理了2012到2017年计算机视觉领域的大事件:以论文和其他干货资源为主,并附上资源地址。囊括上百篇论文,分ImageNet 分类、物体检测、物体追踪、物体识别、图像与语言和图像生成等多个方向进行介绍。 上述的...

    经典论文

    计算机视觉论文

    1. ImageNet分类
    2. 物体检测
    3. 物体跟踪
    4. 低级视觉
    5. 边缘检测
    6. 语义分割
    7. 视觉注意力和显著性
    8. 物体识别
    9. 人体姿态估计
    10. CNN原理和性质(Understanding CNN)
    11. 图像和语言
    12. 图像解说
    13. 视频解说
    14. 图像生成

    微软ResNet

    论文:用于图像识别的深度残差网络

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1512.03385v1.pdf

    微软PRelu(随机纠正线性单元/权重初始化)

    论文:深入学习整流器:在ImageNet分类上超越人类水平

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1502.01852.pdf

    谷歌Batch Normalization

    论文:批量归一化:通过减少内部协变量来加速深度网络训练

    作者:Sergey Ioffe, Christian Szegedy

    链接:http://arxiv.org/pdf/1502.03167.pdf

    谷歌GoogLeNet

    论文:更深的卷积,CVPR 2015

    作者:Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

    链接:http://arxiv.org/pdf/1409.4842.pdf

    牛津VGG-Net

    论文:大规模视觉识别中的极深卷积网络,ICLR 2015

    作者:Karen Simonyan & Andrew Zisserman

    链接:http://arxiv.org/pdf/1409.1556.pdf

    AlexNet

    论文:使用深度卷积神经网络进行ImageNet分类

    作者:Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

    链接:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

    物体检测

    这里写图片描述

    PVANET

    论文:用于实时物体检测的深度轻量神经网络(PVANET:Deep but Lightweight Neural Networks for Real-time Object Detection)

    作者:Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park

    链接:http://arxiv.org/pdf/1608.08021

    纽约大学OverFeat

    论文:使用卷积网络进行识别、定位和检测(OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks),ICLR 2014

    作者:Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun

    链接:http://arxiv.org/pdf/1312.6229.pdf

    伯克利R-CNN

    论文:精确物体检测和语义分割的丰富特征层次结构(Rich feature hierarchies for accurate object detection and semantic segmentation),CVPR 2014

    作者:Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

    链接:http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

    微软SPP

    论文:视觉识别深度卷积网络中的空间金字塔池化(Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition),ECCV 2014

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1406.4729.pdf

    微软Fast R-CNN

    论文:Fast R-CNN

    作者:Ross Girshick

    链接:http://arxiv.org/pdf/1504.08083.pdf

    微软Faster R-CNN

    论文:使用RPN走向实时物体检测(Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks)

    作者:任少卿、何恺明、Ross Girshick、孙剑

    链接:http://arxiv.org/pdf/1506.01497.pdf

    牛津大学R-CNN minus R

    论文:R-CNN minus R

    作者:Karel Lenc, Andrea Vedaldi

    链接:http://arxiv.org/pdf/1506.06981.pdf

    端到端行人检测

    论文:密集场景中端到端的行人检测(End-to-end People Detection in Crowded Scenes)

    作者:Russell Stewart, Mykhaylo Andriluka

    链接:http://arxiv.org/pdf/1506.04878.pdf

    实时物体检测

    论文:你只看一次:统一实时物体检测(You Only Look Once: Unified, Real-Time Object Detection)

    作者:Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

    链接:http://arxiv.org/pdf/1506.02640.pdf

    Inside-Outside Net

    论文:使用跳跃池化和RNN在场景中检测物体(Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks)

    作者:Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

    链接:http://arxiv.org/abs/1512.04143.pdf

    微软ResNet

    论文:用于图像识别的深度残差网络

    作者:何恺明、张祥雨、任少卿和孙剑

    链接:http://arxiv.org/pdf/1512.03385v1.pdf

    R-FCN

    论文:通过区域全卷积网络进行物体识别(R-FCN: Object Detection via Region-based Fully Convolutional Networks)

    作者:代季峰,李益,何恺明,孙剑

    链接:http://arxiv.org/abs/1605.06409

    SSD

    论文:单次多框检测器(SSD: Single Shot MultiBox Detector)

    作者:Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

    链接:http://arxiv.org/pdf/1512.02325v2.pdf

    速度/精度权衡

    论文:现代卷积物体检测器的速度/精度权衡(Speed/accuracy trade-offs for modern convolutional object detectors)

    作者:Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

    链接:http://arxiv.org/pdf/1611.10012v1.pdf

    物体跟踪

    • 论文:用卷积神经网络通过学习可区分的显著性地图实现在线跟踪(Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network)

    作者:Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han

    地址:arXiv:1502.06796.

    • 论文:DeepTrack:通过视觉跟踪的卷积神经网络学习辨别特征表征(DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking)

    作者:Hanxi Li, Yi Li and Fatih Porikli

    发表: BMVC, 2014.

    • 论文:视觉跟踪中,学习深度紧凑图像表示(Learning a Deep Compact Image Representation for Visual Tracking)

    作者:N Wang, DY Yeung

    发表:NIPS, 2013.

    • 论文:视觉跟踪的分层卷积特征(Hierarchical Convolutional Features for Visual Tracking)

    作者:Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang

    发表: ICCV 2015

    • 论文:完全卷积网络的视觉跟踪(Visual Tracking with fully Convolutional Networks)

    作者:Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu,

    发表:ICCV 2015

    • 论文:学习多域卷积神经网络进行视觉跟踪(Learning Multi-Domain Convolutional Neural Networks for Visual Tracking)

    作者:Hyeonseob Namand Bohyung Han

    对象识别(Object Recognition)

    论文:卷积神经网络弱监督学习(Weakly-supervised learning with convolutional neural networks)

    作者:Maxime Oquab,Leon Bottou,Ivan Laptev,Josef Sivic,CVPR,2015

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Oquab_Is_Object_Localization_2015_CVPR_paper.pdf

    FV-CNN

    论文:深度滤波器组用于纹理识别和分割(Deep Filter Banks for Texture Recognition and Segmentation)

    作者:Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Cimpoi_Deep_Filter_Banks_2015_CVPR_paper.pdf

    人体姿态估计(Human Pose Estimation)

    • 论文:使用 Part Affinity Field的实时多人2D姿态估计(Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields)

    作者:Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, CVPR, 2017.

    • 论文:Deepcut:多人姿态估计的联合子集分割和标签(Deepcut: Joint subset partition and labeling for multi person pose estimation)

    作者:Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele, CVPR, 2016.

    • 论文:Convolutional pose machines

    作者:Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh, CVPR, 2016.

    • 论文:人体姿态估计的 Stacked hourglass networks(Stacked hourglass networks for human pose estimation)

    作者:Alejandro Newell, Kaiyu Yang, and Jia Deng, ECCV, 2016.

    • 论文:用于视频中人体姿态估计的Flowing convnets(Flowing convnets for human pose estimation in videos)

    作者:Tomas Pfister, James Charles, and Andrew Zisserman, ICCV, 2015.

    • 论文:卷积网络和人类姿态估计图模型的联合训练(Joint training of a convolutional network and a graphical model for human pose estimation)

    作者:Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler, NIPS, 2014.

    理解CNN

    这里写图片描述

    • 论文:通过测量同变性和等价性来理解图像表示(Understanding image representations by measuring their equivariance and equivalence)

    作者:Karel Lenc, Andrea Vedaldi, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf

    • 论文:深度神经网络容易被愚弄:无法识别的图像的高置信度预测(Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images)

    作者:Anh Nguyen, Jason Yosinski, Jeff Clune, CVPR, 2015.

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf

    • 论文:通过反演理解深度图像表示(Understanding Deep Image Representations by Inverting Them)

    作者:Aravindh Mahendran, Andrea Vedaldi, CVPR, 2015

    链接:
    http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf

    • 论文:深度场景CNN中的对象检测器(Object Detectors Emerge in Deep Scene CNNs)

    作者:Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, ICLR, 2015.

    链接:http://arxiv.org/abs/1412.6856

    • 论文:用卷积网络反演视觉表示(Inverting Visual Representations with Convolutional Networks)

    作者:Alexey Dosovitskiy, Thomas Brox, arXiv, 2015.

    链接:http://arxiv.org/abs/1506.02753

    • 论文:可视化和理解卷积网络(Visualizing and Understanding Convolutional Networks)

    作者:Matthrew Zeiler, Rob Fergus, ECCV, 2014.

    链接:http://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf

    图像与语言

    图像说明(Image Captioning)

    这里写图片描述

    UCLA / Baidu

    用多模型循环神经网络解释图像(Explain Images with Multimodal Recurrent Neural Networks)

    Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille, arXiv:1410.1090

    http://arxiv.org/pdf/1410.1090

    Toronto

    使用多模型神经语言模型统一视觉语义嵌入(Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models)

    Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel, arXiv:1411.2539.

    http://arxiv.org/pdf/1411.2539

    Berkeley

    用于视觉识别和描述的长期循环卷积网络(Long-term Recurrent Convolutional Networks for Visual Recognition and Description)

    Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, arXiv:1411.4389.

    http://arxiv.org/pdf/1411.4389

    Google

    看图写字:神经图像说明生成器(Show and Tell: A Neural Image Caption Generator)

    Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, arXiv:1411.4555.

    http://arxiv.org/pdf/1411.4555

    Stanford

    用于生成图像描述的深度视觉语义对齐(Deep Visual-Semantic Alignments for Generating Image Description)

    Andrej Karpathy, Li Fei-Fei, CVPR, 2015.

    Web:http://cs.stanford.edu/people/karpathy/deepimagesent/

    Paper:http://cs.stanford.edu/people/karpathy/cvpr2015.pdf

    UML / UT

    使用深度循环神经网络将视频转换为自然语言(Translating Videos to Natural Language Using Deep Recurrent Neural Networks)

    Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, NAACL-HLT, 2015.

    http://arxiv.org/pdf/1412.4729

    CMU / Microsoft

    学习图像说明生成的循环视觉表示(Learning a Recurrent Visual Representation for Image Caption Generation)

    Xinlei Chen, C. Lawrence Zitnick, arXiv:1411.5654.

    Xinlei Chen, C. Lawrence Zitnick, Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation, CVPR 2015

    http://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf

    Microsoft

    从图像说明到视觉概念(From Captions to Visual Concepts and Back)

    Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, CVPR, 2015.

    http://arxiv.org/pdf/1411.4952

    Univ. Montreal / Univ. Toronto

    Show, Attend, and Tell:视觉注意力与神经图像标题生成(Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention)

    Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, arXiv:1502.03044 / ICML 2015

    http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf

    Idiap / EPFL / Facebook

    基于短语的图像说明(Phrase-based Image Captioning)

    Remi Lebret, Pedro O. Pinheiro, Ronan Collobert, arXiv:1502.03671 / ICML 2015

    http://arxiv.org/pdf/1502.03671

    UCLA / Baidu

    像孩子一样学习:从图像句子描述快速学习视觉的新概念(Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images)

    Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, arXiv:1504.06692

    http://arxiv.org/pdf/1504.06692

    MS + Berkeley

    探索图像说明的最近邻方法( Exploring Nearest Neighbor Approaches for Image Captioning)

    Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick, arXiv:1505.04467

    http://arxiv.org/pdf/1505.04467.pdf

    图像说明的语言模型(Language Models for Image Captioning: The Quirks and What Works)

    Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell, arXiv:1505.01809

    http://arxiv.org/pdf/1505.01809.pdf

    阿德莱德

    具有中间属性层的图像说明( Image Captioning with an Intermediate Attributes Layer)

    Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick, arXiv:1506.01144

    蒂尔堡

    通过图片学习语言(Learning language through pictures)

    Grzegorz Chrupala, Akos Kadar, Afra Alishahi, arXiv:1506.03694

    蒙特利尔大学

    使用基于注意力的编码器-解码器网络描述多媒体内容(Describing Multimedia Content using Attention-based Encoder-Decoder Networks)

    Kyunghyun Cho, Aaron Courville, Yoshua Bengio, arXiv:1507.01053

    康奈尔

    图像表示和神经图像说明的新领域(Image Representations and New Domains in Neural Image Captioning)

    Jack Hessel, Nicolas Savva, Michael J. Wilber, arXiv:1508.02091

    MS + City Univ. of HongKong

    Learning Query and Image Similarities with Ranking Canonical Correlation Analysis

    Ting Yao, Tao Mei, and Chong-Wah Ngo, ICCV, 2015

    视频字幕(Video Captioning)

    伯克利

    Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015.

    犹他州/ UML / 伯克利

    Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729.

    微软

    Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861.

    犹他州/ UML / 伯克利

    Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence–Video to Text, arXiv:1505.00487.

    蒙特利尔大学/ 舍布鲁克

    Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029

    MPI / 伯克利

    Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698

    多伦多大学 / MIT

    Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724

    蒙特利尔大学

    Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053

    TAU / 美国南加州大学

    Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950.

    图像生成

    卷积/循环网络
    • 论文:Conditional Image Generation with PixelCNN Decoders”

    作者:Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu

    • 论文:Learning to Generate Chairs with Convolutional Neural Networks

    作者:Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox

    发表:CVPR, 2015.

    • 论文:DRAW: A Recurrent Neural Network For Image Generation

    作者:Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra

    发表:ICML, 2015.

    对抗网络
    • 论文:生成对抗网络(Generative Adversarial Networks)

    作者:Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

    发表:NIPS, 2014.

    • 论文:使用对抗网络Laplacian Pyramid 的深度生成图像模型(Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks)

    作者:Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

    发表:NIPS, 2015.

    • 论文:生成模型演讲概述 (A note on the evaluation of generative models)

    作者:Lucas Theis, Aäron van den Oord, Matthias Bethge

    发表:ICLR 2016.

    • 论文:变分自动编码深度高斯过程(Variationally Auto-Encoded Deep Gaussian Processes)

    作者:Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence

    发表:ICLR 2016.

    • 论文:用注意力机制从字幕生成图像 (Generating Images from Captions with Attention)

    作者:Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov

    发表: ICLR 2016

    • 论文:分类生成对抗网络的无监督和半监督学习(Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks)

    作者:Jost Tobias Springenberg

    发表:ICLR 2016

    • 论文:用一个对抗检测表征(Censoring Representations with an Adversary)

    作者:Harrison Edwards, Amos Storkey

    发表:ICLR 2016

    • 论文:虚拟对抗训练实现分布式顺滑 (Distributional Smoothing with Virtual Adversarial Training)

    作者:Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin Ishii

    发表:ICLR 2016

    • 论文:自然图像流形上的生成视觉操作(Generative Visual Manipulation on the Natural Image Manifold)

    作者:朱俊彦, Philipp Krahenbuhl, Eli Shechtman, and Alexei A. Efros

    发表: ECCV 2016.

    • 论文:深度卷积生成对抗网络的无监督表示学习(Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks)

    作者:Alec Radford, Luke Metz, Soumith Chintala

    发表: ICLR 2016

    问题回答

    这里写图片描述

    弗吉尼亚大学 / 微软研究院

    论文:VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.

    作者:Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh

    MPI / 伯克利

    论文:Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

    作者:Mateusz Malinowski, Marcus Rohrbach, Mario Fritz,

    发布 : arXiv:1505.01121.

    多伦多

    论文: Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

    作者:Mengye Ren, Ryan Kiros, Richard Zemel

    发表: arXiv:1505.02074 / ICML 2015 deep learning workshop.

    百度/ 加州大学洛杉矶分校

    作者:Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, 徐伟

    论文:Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

    发表: arXiv:1505.05612.

    POSTECH(韩国)

    论文:Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

    作者:Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han

    发表: arXiv:1511.05765

    CMU / 微软研究院

    论文:Stacked Attention Networks for Image Question Answering

    作者:Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2015)

    发表: arXiv:1511.02274.

    MetaMind

    论文:Dynamic Memory Networks for Visual and Textual Question Answering

    作者:Xiong, Caiming, Stephen Merity, and Richard Socher

    发表: arXiv:1603.01417 (2016).

    首尔国立大学 + NAVER

    论文:Multimodal Residual Learning for Visual QA

    作者:Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang

    发表:arXiv:1606:01455

    UC Berkeley + 索尼

    论文:Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

    作者:Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach

    发表:arXiv:1606.01847

    Postech

    论文:Training Recurrent Answering Units with Joint Loss Minimization for VQA

    作者:Hyeonwoo Noh and Bohyung Han

    发表: arXiv:1606.03647

    首尔国立大学 + NAVER

    论文: Hadamard Product for Low-rank Bilinear Pooling

    作者:Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhan

    发表:arXiv:1610.04325.

    视觉注意力和显著性

    这里写图片描述
    论文:Predicting Eye Fixations using Convolutional Neural Networks

    作者:Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu

    发表:CVPR, 2015.

    学习地标的连续搜索

    作者:Learning a Sequential Search for Landmarks

    论文:Saurabh Singh, Derek Hoiem, David Forsyth

    发表:CVPR, 2015.

    视觉注意力机制实现多物体识别

    论文:Multiple Object Recognition with Visual Attention

    作者:Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu,

    发表:ICLR, 2015.

    视觉注意力机制的循环模型

    作者:Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

    论文:Recurrent Models of Visual Attention

    发表:NIPS, 2014.

    低级视觉

    超分辨率
    • Iterative Image Reconstruction

    Sven Behnke: Learning Iterative Image Reconstruction. IJCAI, 2001.

    Sven Behnke: Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid. International Journal of Computational Intelligence and Applications, vol. 1, no. 4, pp. 427-438, 2001.

    • Super-Resolution (SRCNN)

    Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution, ECCV, 2014.

    Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Image Super-Resolution Using Deep Convolutional Networks, arXiv:1501.00092.

    • Very Deep Super-Resolution

    Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Accurate Image Super-Resolution Using Very Deep Convolutional Networks, arXiv:1511.04587, 2015.

    • Deeply-Recursive Convolutional Network

    Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Deeply-Recursive Convolutional Network for Image Super-Resolution, arXiv:1511.04491, 2015.

    • Casade-Sparse-Coding-Network

    Zhaowen Wang, Ding Liu, Wei Han, Jianchao Yang and Thomas S. Huang, Deep Networks for Image Super-Resolution with Sparse Prior. ICCV, 2015.

    • Perceptual Losses for Super-Resolution

    Justin Johnson, Alexandre Alahi, Li Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, arXiv:1603.08155, 2016.

    • SRGAN

    Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, arXiv:1609.04802v3, 2016.

    其他应用

    Optical Flow (FlowNet)

    Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox, FlowNet: Learning Optical Flow with Convolutional Networks, arXiv:1504.06852.

    Compression Artifacts Reduction

    Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, arXiv:1504.06993.

    Blur Removal

    Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Schölkopf, Learning to Deblur, arXiv:1406.7444

    Jian Sun, Wenfei Cao, Zongben Xu, Jean Ponce, Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal, CVPR, 2015

    Image Deconvolution

    Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia, Deep Convolutional Neural Network for Image Deconvolution, NIPS, 2014.

    Deep Edge-Aware Filter

    Li Xu, Jimmy SJ. Ren, Qiong Yan, Renjie Liao, Jiaya Jia, Deep Edge-Aware Filters, ICML, 2015.

    Computing the Stereo Matching Cost with a Convolutional Neural Network

    Jure Žbontar, Yann LeCun, Computing the Stereo Matching Cost with a Convolutional Neural Network, CVPR, 2015.

    Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV, 2016

    Feature Learning by Inpainting

    Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, Context Encoders: Feature Learning by Inpainting, CVPR, 2016

    边缘检测

    这里写图片描述
    Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection, arXiv:1504.06375.

    DeepEdge

    Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015.

    DeepContour

    Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang, DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection, CVPR, 2015.

    语义分割

    这里写图片描述

    SEC: Seed, Expand and Constrain

    Alexander Kolesnikov, Christoph Lampert, Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation, ECCV, 2016.

    Adelaide

    Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient piecewise training of deep structured models for semantic segmentation, arXiv:1504.01013. (1st ranked in VOC2012)

    Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel, Deeply Learning the Messages in Message Passing Inference, arXiv:1508.02108. (4th ranked in VOC2012)

    Deep Parsing Network (DPN)

    Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang, Semantic Image Segmentation via Deep Parsing Network, arXiv:1509.02634 / ICCV 2015 (2nd ranked in VOC 2012)

    CentraleSuperBoundaries, INRIA

    Iasonas Kokkinos, Surpassing Humans in Boundary Detection using Deep Learning, arXiv:1411.07386 (4th ranked in VOC 2012)

    BoxSup

    Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640. (6th ranked in VOC2012)

    POSTECH

    Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation, arXiv:1505.04366. (7th ranked in VOC2012)

    Seunghoon Hong, Hyeonwoo Noh, Bohyung Han, Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation, arXiv:1506.04924.

    Seunghoon Hong,Junhyuk Oh,Bohyung Han, andHonglak Lee, Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, arXiv:1512.07928

    Conditional Random Fields as Recurrent Neural Networks

    Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr, Conditional Random Fields as Recurrent Neural Networks, arXiv:1502.03240. (8th ranked in VOC2012)

    DeepLab

    Liang-Chieh Chen, George Papandreou, Kevin Murphy, Alan L. Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image segmentation, arXiv:1502.02734. (9th ranked in VOC2012)

    Zoom-out

    Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015

    Joint Calibration

    Holger Caesar, Jasper Uijlings, Vittorio Ferrari, Joint Calibration for Semantic Segmentation, arXiv:1507.01581.

    Fully Convolutional Networks for Semantic Segmentation

    Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.

    Hypercolumn

    Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for Object Segmentation and Fine-Grained Localization, CVPR, 2015.

    Deep Hierarchical Parsing

    Abhishek Sharma, Oncel Tuzel, David W. Jacobs, Deep Hierarchical Parsing for Semantic Segmentation, CVPR, 2015.

    Learning Hierarchical Features for Scene Labeling

    Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, ICML, 2012.

    Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Learning Hierarchical Features for Scene Labeling, PAMI, 2013.

    University of Cambridge

    Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.” arXiv preprint arXiv:1511.00561, 2015.

    Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla “Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding.” arXiv preprint arXiv:1511.02680, 2015.

    Princeton

    Fisher Yu, Vladlen Koltun, “Multi-Scale Context Aggregation by Dilated Convolutions”, ICLR 2016

    Univ. of Washington, Allen AI

    Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi, “Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing”, ICCV, 2015

    INRIA

    Iasonas Kokkinos, “Pusing the Boundaries of Boundary Detection Using deep Learning”, ICLR 2016

    UCSB

    Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, “Weakly supervised graph based semantic segmentation by learning communities of image-parts”, ICCV, 2015

    其他资源

    课程

    深度视觉

    [斯坦福] CS231n: Convolutional Neural Networks for Visual Recognition

    [香港中文大学] ELEG 5040: Advanced Topics in Signal Processing(Introduction to Deep Learning)

    · 更多深度课程推荐

    [斯坦福] CS224d: Deep Learning for Natural Language Processing

    [牛津 Deep Learning by Prof. Nando de Freitas

    [纽约大学] Deep Learning by Prof. Yann LeCun

    图书

    免费在线图书

    Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

    Neural Networks and Deep Learning by Michael Nielsen

    Deep Learning Tutorial by LISA lab, University of Montreal

    视频

    演讲

    Deep Learning, Self-Taught Learning and Unsupervised Feature Learning By Andrew Ng

    Recent Developments in Deep Learning By Geoff Hinton

    The Unreasonable Effectiveness of Deep Learning by Yann LeCun

    Deep Learning of Representations by Yoshua bengio

    软件

    框架
    • Tensorflow: An open source software library for numerical computation using data flow graph by Google [Web]
    • Torch7: Deep learning library in Lua, used by Facebook and Google Deepmind [Web]
    • Torch-based deep learning libraries: [torchnet],
    • Caffe: Deep learning framework by the BVLC [Web]
    • Theano: Mathematical library in Python, maintained by LISA lab [Web]
    • Theano-based deep learning libraries: [Pylearn2], [Blocks], [Keras], [Lasagne]
    • MatConvNet: CNNs for MATLAB [Web]
    • MXNet: A flexible and efficient deep learning library for heterogeneous distributed systems with multi-language support [Web]
    • Deepgaze: A computer vision library for human-computer interaction based on CNNs [Web]

    应用

    • 对抗训练 Code and hyperparameters for the paper “Generative Adversarial Networks” [Web]
    • 理解与可视化 Source code for “Understanding Deep Image Representations by Inverting Them,” CVPR, 2015. [Web]
    • 词义分割 Source code for the paper “Rich feature hierarchies for accurate object detection and semantic segmentation,” CVPR, 2014. [Web] ; Source code for the paper “Fully Convolutional Networks for Semantic Segmentation,” CVPR, 2015. [Web]
    • 超分辨率 Image Super-Resolution for Anime-Style-Art [Web]
    • 边缘检测 Source code for the paper “DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection,” CVPR, 2015. [Web]
    • Source code for the paper “Holistically-Nested Edge Detection”, ICCV 2015. [Web]

    讲座

    • [CVPR 2014] Tutorial on Deep Learning in Computer Vision
    • [CVPR 2015] Applied Deep Learning for Computer Vision with Torch

    博客

    • Deep down the rabbit hole: CVPR 2015 and beyond@Tombone’s Computer Vision Blog
    • CVPR recap and where we’re going@Zoya Bylinskii (MIT PhD Student)’s Blog
    • Facebook’s AI Painting@Wired
    • Inceptionism: Going Deeper into Neural Networks@Google Research
    • Implementing Neural networks
    展开全文
  • CVonline http://homepages.inf.ed.ac.uk/rbf/CVonline http://homepages.inf.ed.ac.uk/rbf/CVonline/unfolded.htm http://homepages.inf.ed.ac.uk/rbf/CVonline/CVentry.htm ...Markov Random Fiel

    CVonline

    http://homepages.inf.ed.ac.uk/rbf/CVonline

    http://homepages.inf.ed.ac.uk/rbf/CVonline/unfolded.htm

    http://homepages.inf.ed.ac.uk/rbf/CVonline/CVentry.htm

    李子青的大作:

    Markov Random Field Modeling in Computer Vision

    http://www.cbsr.ia.ac.cn/users/szli/mrf_book/book.html

    Handbook of Face Recognition (PDF)

    http://www.umiacs.umd.edu/~shaohua/papers/zhou04hfr.pdf


    张正友的有关参数鲁棒估计著作:

    Parameter Estimation Techniques:A Tutorial with Application to Conic Fitting

    http://research.microsoft.com/~zhang/INRIA/Publis/Tutorial-Estim/Main.html



    Andrea Fusiello“计算机视觉中的几何”教程:Elements of Geometric Computer Vision

    http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/FUSIELLO4/tutorial.html#x1-520007


    有关马尔可夫蒙特卡罗方法的资料:

    An introduction to Markov chain Monte Carlo

    http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/SENEGAS/mcmc.html

    Markov Chain Monte Carlo for Computer Vision--- A tutorial at ICCV05

           http://civs.stat.ucla.edu/MCMC/MCMC_tutorial.htm

    有关独立成分分析(Independent Component Analysis , ICA)的资料:

    An ICA-Page

    http://www.cnl.salk.edu/~tony/ica.html

    Fast ICA

    http://www.cis.hut.fi/projects/ica/fastica/

           The Kalman Filter (介绍卡尔曼滤波器的终极网页)

          http://www.cs.unc.edu/~welch/kalman/index.html

    Cached k-d tree search for ICP algorithms

    http://kos.informatik.uni-osnabrueck.de/download/3dim2007/paper.html


    几个计算机视觉研究工具

    Machine Vision Toolbox for Matlab

    http://www.petercorke.com/Machine%20Vision%20Toolbox.html

    Matlab and Octave Function for Computer Vision and Image Processing

    http://www.csse.uwa.edu.au/~pk/research/matlabfns/

    Bayes Net Toolbox for Matlab

    http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html

    OpenCV (Chinese)

    http://www.opencv.org.cn/index.php/%E9%A6%96%E9%A1%B5

    Gandalf (A Computer Vision and Numerical Algorithm Labrary)

    http://gandalf-library.sourceforge.net/

    CMU Computer Vision Home Page

    http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/vision.html

    Machine Learning Resource Links

    http://www.cse.ust.hk/~ivor/resource.htm

    The Bayesian Filtering Library

    http://www.orocos.org/bfl

    Optical Flow Algorithm Evaluation (提供了一个动态贝叶斯网络框架,例如递归信息处理与分析、卡尔曼滤波、粒子滤波、序列蒙特卡罗方法等,C++写的)

    http://of-eval.sourceforge.net/

    MATLAB code for ICP algorithm

    http://www.usenet.com/newsgroups/comp.graphics.visualization/msg00102.html

    牛人主页:

    朱松纯 (Song-Chun Zhu

    http://www.stat.ucla.edu/~sczhu/

    David Lowe (SIFT) (很帅的一个老头哦 ^ ^)

    http://www.cs.ubc.ca/~lowe/

    Andrea Vedaldi (SIFT)

    http://vision.ucla.edu/~vedaldi/index.html

    Pedro F. Felzenszwalb

    http://people.cs.uchicago.edu/~pff/

    Dougla Dlanman (Brown的一个研究生,在其主页上搜集了大量算法教程和源码)

    http://mesh.brown.edu/dlanman/courses.html

    Jianbo Shi (Ncuts 的始作俑者)

    http://www.cis.upenn.edu/~jshi/

    Active Vision Group (Oxford的一个机器视觉研究团队,特色是SLAM,监视,导航)

    http://www.robots.ox.ac.uk/ActiveVision/index.html

    Juyang Weng(机器学习的专家,Autonomous Mental Development 是其特色

    http://www.cse.msu.edu/~weng/

    测试图片或视频:

    Middlebury College‘s Stereo Vision Data Set

    http://cat.middlebury.edu/stereo/data.html

    Intelligent Vehicle:

    IVSource

    www.ivsoruce.net

    Robot Car

    http://www.plyojump.com/robot_cars.html

    How to Build a Robot: The Computer Vision Part

    http://www.societyofrobots.com/programming_computer_vision_tutorial.shtml

    posted @ 2011-03-31 11:36 Livid 阅读(112) 评论(0) 编辑

    (转)Computer Vision Open Source Algorithm Implementations

    Computer Vision Open Source Algorithm Implementations

    Participate in Reproducible Research

    WARNING: this page is not and will never be exhaustive but only try to gather robust implementations of Computer Vision state of the art


    (back to computer vision resource)

    If you have additions or changes, send an e-mail (remove the "nospam").

    Changelog

    RSS feed. If you have any issue please send an e-mail (remove the "nospam").

    This material is presented to ensure timely dissemination of computer vision algorithms. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each authors copyright.

    General Image Processing

    OpenCV
    (C/C++ code, BSD lic) Image manipulation, matrix manipulation, transforms
    Torch3Vision
    (C/C++ code, BSD lic) Basic image processing, matrix manipulation and feature extraction algorithms: rotation, flip, photometric normalisations (Histogram Equalization, Multiscale Retinex, Self-Quotient Image or Gross-Brajovic), edge detection, 2D DCT, 2D FFT, 2D Gabor, PCA to do Eigen-Faces, LDA to do Fisher-Faces. Various metrics (Euclidean, Mahanalobis, ChiSquare, NormalizeCorrelation, TangentDistance, ...)
    GradientShop
    (C/C++ code, GPL lic) GradientShop: A Gradient-Domain Optimization Framework for Image and Video Filtering
    ImLab
    (C/C++ code, MIT lic) A Free Experimental System for Image Processing (loading, transforms, filters, histogram, morphology, ...)
    CIMG
    (C/C++ code, GPL and LGPL lic) CImg Library is an open source C++ toolkit for image processing
    Generic Image Library (GIL) - boost integration
    (C/C++ code, MIT lic) Adobe open source C++ Generic Image Library (GIL)

    Image Acquisition, Decoding & encoding

    FFMPEG
    (C/C++ code, LGPL or GPL lic) Record, convert and stream audio and video (lot of codec)
    OpenCV
    (C/C++ code, BSD lic) PNG, JPEG,... images, avi video files, USB webcam,...
    Torch3Vision
    (C/C++ code, BSD lic) Video file decoding/encoding (ffmpeg integration), image capture from a frame grabber or from USB, Sony pan/tilt/zoom camera control using VISCA interface
    lib VLC
    (C/C++ code, GPL lic) Used by VLC player: record, convert and stream audio and video
    Live555
    (C/C++ code, LGPL lic) RTSP streams
    ImageMagick
    (C/C++ code, GPL lic) Loading & saving DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, TIFF, and more
    DevIL
    (C/C++ code, LGPL lic) Loading & saving various image format
    FreeImage
    (C/C++ code, GPL & FPL lic) PNG, BMP, JPEG, TIFF loading

    Segmentation

    OpenCV
    (C/C++ code, BSD lic) Pyramid image segmentation
    Branch-and-Mincut
    (C/C++ code, Microsoft Research Lic) Branch-and-Mincut Algorithm for Image Segmentation
    Efficiently solving multi-label MRFs (Readme)
    (C/C++ code) Segmentation, object category labelling, stereo

    Machine Learning

    Torch
    (C/C++ code, BSD lic) Gradient machines ( multi-layered perceptrons, radial basis functions, mixtures of experts, convolutional networks and even time-delay neural networks), Support vector machines, Ensemble models (bagging, adaboost), Non-parametric models (K-nearest-neighbors, Parzen regression and Parzen density estimator), distributions (Kmeans, Gaussian mixture models, hidden Markov models, input-output hidden Markov models, and Bayes classifier), speech recognition tools

    Object Detection

    OpenCV
    (C/C++ code, BSD lic) Viola-jones face detection (Haar features)
    Torch3Vision
    (C/C++ code, BSD lic) MLP & cascade of Haar-like classifiers face detection
    Hough Forests
    (C/C++ code, Microsoft Research Lic) Class-Specific Hough Forests for Object Detection
    Efficient Subwindow Object Detection
    (C/C++ code, Apache Lic) Christoph Lampert "Efficient Subwindow" algorithms for Object Detection

    Object Category Labelling

    Efficiently solving multi-label MRFs (Readme)
    (C/C++ code) Segmentation, object category labelling, stereo

    Optical flow

    OpenCV
    (C/C++ code, BSD lic) Horn & Schunck algorithm, Lucas & Kanade algorithm, Lucas-Kanade optical flow in pyramids, block matching
    GPU-KLT+FLOW
    (C/C++/OpenGL/Cg code, LGPL) Gain-Adaptive KLT Tracking and TV-L1 optical flow on the GPU

    Features Extraction & Matching

    SIFT by R. Hess
    (C/C++ code, GPL lic) SIFT feature extraction & RANSAC matching
    OpenSURF
    (C/C++ code) SURF feature extraction algorihtm (kind of fast SIFT)
    ASIFT (from IPOL)
    (C/C++ code, Ecole Polytechnique and ENS Cachan for commercial Lic) Affine SIFT (ASIFT)
    VLFeat (formely Sift++)
    (C/C++ code) SIFT, MSER, k-means, hierarchical k-means, agglomerative information bottleneck, and quick shift
    SiftGPU
    A GPU Implementation of Scale Invariant Feature Transform (SIFT)
    Groupsac
    (C/C++ code, GPL lic) An enhance version of RANSAC that considers the correlation between data points

    Nearest Neighbors matching

    FLANN
    (C/C++ code, BSD lic) Approximate Nearest Neighbors (Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration)
    ANN
    (C/C++ code, LGPL lic) Approximate Nearest Neighbor Searching

    Tracking

    OpenCV
    (C/C++ code, BSD lic) Kalman, Condensation, CAMSHIFT, Mean shift, Snakes
    KLT: An Implementation of the Kanade-Lucas-Tomasi Feature Tracker
    (C/C++ code, public domain) Kanade-Lucas-Tomasi Feature Tracker
    GPU_KLT
    (C/C++/OpenGL/Cg code, ) A GPU-based Implementation of the Kanade-Lucas-Tomasi Feature Tracker
    GPU-KLT+FLOW
    (C/C++/OpenGL/Cg code, LGPL) Gain-Adaptive KLT Tracking and TV-L1 optical flow on the GPU

    Simultaneous localization and mapping

    Real-Time SLAM - SceneLib
    (C/C++ code, LGPL lic) Real-time vision-based SLAM with a single camera
    PTAM
    (C/C++ code, Isis Innovation Limited lic) Parallel Tracking and Mapping for Small AR Workspaces

    Camera Calibration & constraint

    OpenCV
    (C/C++ code, BSD lic) Chessboard calibration, calibration with rig or pattern
    Geometric camera constraint - Minimal Problems in Computer Vision
    Minimal problems in computer vision arise when computing geometrical models from image data. They often lead to solving systems of algebraic equations.
    Camera Calibration Toolbox for Matlab
    (Matlab toolbox) Camera Calibration Toolbox for Matlab by Jean-Yves Bouguet (C implementation in OpenCV)

    Multi-View Reconstruction

    Bundle Adjustment - SBA
    (C/C++ code, GPL lic) A Generic Sparse Bundle Adjustment Package Based on the Levenberg-Marquardt Algorithm
    Bundle Adjustment - SSBA
    (C/C++ code, LGPL lic) Simple Sparse Bundle Adjustment (SSBA)

    Stereo

    Efficiently solving multi-label MRFs (Readme)
    (C/C++ code) Segmentation, object category labelling, stereo

    Structure from motion

    Bundler
    (C/C++ code, GPL lic) A structure-from-motion system for unordered image collections
    Patch-based Multi-view Stereo Software (Windows version)
    (C/C++ code, GPL lic) A multi-view stereo software that takes a set of images and camera parameters, then reconstructs 3D structure of an object or a scene visible in the images
    libmv - work in progress
    (C/C++ code, MIT lic) A structure from motion library

    转自:http://www.cnblogs.com/loongfee/archive/2012/12/11/2813152.html

    展开全文
  • 计算机视觉相关综述整理 计算机视觉与图像识别综述:这是一篇偏科普的通俗型综述,了解相关历史和发展进程,对一些技术有初步的认识。 卷积神经网络综述:作者回顾了从1998年开始,近18年来深度神经网络的架构发展...
  • 计算机视觉从入门到放肆 一、基础知识 1.1 计算机视觉到底是什么? 计算机视觉是一门研究如何让机器“看”的科学 更进一步的说,就是使用摄像机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步...
  • 在2012年,我整理了一份有关计算机视觉的热门论文清单。我把论文的研究重点放在视觉科学上,避免其与图形处理、调研和纯静态处理等方向产生重叠。但在2012年后随着深度学习技术的兴起,计算机视觉科学发生了巨大的...
  • 一、机器视觉与计算机视觉的区别与联系  在很多情况下,我们误认为机器视觉就是计算机视觉,其实这是不准确的。首先我们从定义着手,何为机器视觉?何为计算机视觉?机器视觉其实就是用机器代替人眼进行测量和判断...
  • 这是一份被国外专家引用最多的计算机视觉和深度学习论文清单。 本文整理来自http://www.afenxi.com/post/19793 在2012年,我整理了一份有关计算机视觉的热门论文清单。我把论文的研究重点放在视觉科学上,避免其与...
  • 最新计算机视觉动态哪里看? 1 背景 会议论文比期刊论文更重要的原因是:(1)因为机器学习、计算机视觉和人工智能领域发展非常迅速,新的工作层出不穷,如果把论文投到期刊上,一两年后刊出时就有点out了。因此...
  • 计算机视觉领域正在从统计方法转向深度学习神经网络方法。 计算机视觉中仍有许多具有挑战性的问题需要解决。然而,深度学习方法正在针对某些特定问题取得最新成果。 在最基本的问题上,最有趣的不仅仅是深度学习...
  • 原帖地址 http://hi.baidu.com/daren007 1、D. Marr; T. Poggio.Cooperative Computation of Stereo Disparity.Science, New Series, Vol. 194, No. 4262. (Oct....这一篇是marr计算机视觉框架的开创性论文,到目前为
  • IEEE transactions on geoscience and remote sensing:IEEE Trans. Geosci. Remote Sens. Pattern Recognition:Pattern Recognit. IEEE Geoscience and remote sensing magazine:IEEE Geosci....
  • 引言 本节内容主要来源于Computer Vision: ...计算机视觉发展 计算机视觉知识结构图 参考文献 [1] Richard Szeliski," Computer Vision: Algorithms and Applications"Published , November 24, 2010
  • 计算机视觉中的边缘检测 边缘检测是计算机视觉中最重要的概念之一。这是一个很直观的概念,在一个图像上运行图像检测应该只输出边缘,与素描比较相似。我的目 标不仅是清晰地解释边缘检测是怎样工作的,...
  • 搞了CV一段时间,仍时不时因为概念问题而困惑,搞不清楚计算机视觉(Computer Vision),计算机图形学(Computer Graphics)和图像处理(Image Processing)的区别和联系。在知乎上看到了一个帖子,觉得解释的很好,结合...
  • 计算机视觉论文

    2013-11-29 09:37:43
    找到了一个很好的博客,作者很详尽的总结了一系列有深刻影响的计算机视觉方面的论文,希望有更多的人能够看过这些经典的论文。在此转载改博客,在此向水木上表示深深的敬意,只有有更多像这样善于总结和分享的人才能...
  • 计算机视觉应用综述

    2018-12-09 10:06:23
    有研究表明,人对外界的环境的感知70%以上来自人类的视觉系统,机器也是如此,大多数的信息都包含在图像中,人工智能的实现少不了计算机视觉。那么计算机视觉具体有哪些应用呢? 无人驾驶  无人驾驶又称自动驾驶,...
  • 转载请说明出处!  在看深度学习论文的时候,论文中常会提到在何种数据库上... 1、ImageNet 2010年开始的计算机视觉类的比赛,包括了分类、检测等多类,数据库很大。 网址: http://www.image-net.org/  2、passc
1 2 3 4 5 ... 20
收藏数 16,631
精华内容 6,652
关键字:

计算机视觉经典文献