精华内容
下载资源
问答
  • 修改代码时,原来的引用关系未删除,导致了两个类出现了交叉引用 #ifndef CLASSA #define CLASSA #include "classb.h" class ClassA { funcA(); } #endif #ifndef CLASSB #define CLASSB #include "classa....

    修改代码时,原来的引用关系未删除,导致了两个类出现了交叉引用

    #ifndef CLASSA
    #define CLASSA
    
    #include "classb.h"
    
    class ClassA
    {
        funcA();
    }
    
    #endif
    #ifndef CLASSB
    #define CLASSB
    
    #include "classa.h"
    
    class ClassB
    {
        ClassA funcB();
    }
    
    #endif

    编译时导致ClassB的定义在ClassA的定义前展开,而ClassB中又用到了ClassA,报错ClassA不是一个类型

    出现类型不识别时应该根据错误提示的文件调用关系确认是否引用关系有问题,可更快找到问题

    展开全文
  • 交叉引用cross reference是指 这个地址的 数据或代码 引用了哪个地址 以及 被哪些地址的代码所引用。引用了哪个地址,在反汇编就能看出来,一行汇编代码自然只会引用一个地址。但被引用是一对多的关系,正如一个函数...

    交叉引用cross reference是指 这个地址的 数据或代码 引用了哪个地址 以及 被哪些地址的代码所引用。引用了哪个地址,在反汇编就能看出来,一行汇编代码自然只会引用一个地址。但被引用是一对多的关系,正如一个函数可以被很多函数在内部调用。查看“被引用”是静态分析中得到堆栈的方法,当然,因为一对多的关系,还需要猜。这主要是看分析的目的是什么,与运行时动态分析相比各有好处,静态分析能得到完整的调用关系图。
    在IDA里,cross reference也会缩写成XREF。

    XREF主要是两种,数据引用和代码引用,只要看见有分号注释的 XREF 的地方,把鼠标悬停上去,都能看到部分交叉引用的代码。如果被引用的地方有很多,还可以通过快捷键Ctrl+X或菜单,得到更完整的交叉引用信息。例如在图中的Data XREF右键单击,弹出菜单:

    选择Jump to cross reference,弹出对话框:


    可以看到它被9个位置的代码引用。选中其中一条点击ok,即会跳转到那个地址。


    由于Objective-C是一种弱类型语言,各种函数和成员变量都可以用字符串获取得到,开发者以实际为字符串的selector来调用函数,会令静态分析的引用分析变得复杂。下面做一个演示。

    通过class-dump或xdb,可以知道UIView有这样一个私有API。

    - (void)_addSubview:(id)arg1 positioned:(int)arg2 relativeTo:(id)arg3 
    现在去IDA里搜索,由于category的可能存在,一般搜索一个类的具体某个函数时,关键字是不带类名的,这个函数在IDA中的表示是
    __UIView_Internal___addSubview_positioned_relativeTo__
    如果搜索
    __UIView__addSubview_positioned_relativeTo__
    那肯定搜不到,所以用关键字
    addSubview_positioned_relativeTo

    来搜索,有重复的就search again,直到找到是UIView的函数。它的反汇编代码头部信息为:

    __text:00059DAC ; =============== S U B R O U T I N E =======================================
    __text:00059DAC
    __text:00059DAC ; Attributes: bp-based frame
    __text:00059DAC
    __text:00059DAC __UIView_Internal___addSubview_positioned_relativeTo__ proc near
    __text:00059DAC                                         ; DATA XREF: __objc_const:0075DBE4o

    想知道哪些函数会调用这个函数,很自然想到在上面的DATA XREF处查看。可是,00059DAC这个位置实际只是Objective-C运行时实现中的IMP,即函数实现地址(具体请查看xcode文档)。开发者通过selector的方式调用函数,直接引用的是selector,而不会直接引用到IMP。 编译过程中,selector本身的字符串会被一个method对应,method再对应IMP,反编译的搜索过程就是它的反过程。

    在00059DAC处查看XREF,得到:


    跳转后,发现一堆的字符串:


    这三行其实Method结构体的所有成员。

    00000000 __objc2_meth    struc ; (sizeof=0xC)    ; XREF: __objc_const:000050C0r
    00000000                                         ; __objc_const:000050CCr ...
    00000000 name            DCD ?                   ; offset
    00000004 types           DCD ?                   ; offset
    00000008 imp             DCD ?                   ; offset
    0000000C __objc2_meth    ends

    第一行是字符串的起始地址,name,即源码中以@selector()括起来的字符串。第二行是对selector中参数类型的描述,Type encode请参考《利用Objective-C运行时hook函数的三种方法》,class-dump估计就是靠这些types信息确定各个selector的参数类型。第三行是IMP,所以上面的XREF会jump到这里来。

    因为是selector对应字符串,所以要在0075DBDC处继续查看XREF。这是个字符串指针,所以是操作数,与函数的交叉引用稍有不同,右键单击,弹出菜单:


    选择Jump to xref to operand。这里也有提示,快捷键是X,不是函数交叉引用的Ctrl+X,他们的区别是后者弹出来的是非模态对话框,X弹出来的是模态的:


    这里会发现有三个引用,前两个是普通的引用,最后一个,可以看到IDA分析出是 __objc_selrefs,即有selector对此字符串做引用,所以需要选择__objc_selrefs处的引用,跳转过去:


    如果做重命名多了,会发现这个地方的代码格式很熟悉,因为这里就是程序selector的放置区域,双击反汇编代码中默认命名的selector如off_XXXX就会来到这。

    在DATA XREF上查看:


    会发现非常多的引用,这些引用才是会最终调用到00059DAC的函数。

    从上表也可看出,UIView的addSubview:,insertSubview:atIndex:,insertSubview:aboveSubview:等许多公开接口实际都会调用这个私有API。

    上一篇:IDA反汇编/反编译静态分析iOS模拟器程序(五)F5反编译

    下一篇:IDA反汇编/反编译静态分析iOS模拟器程序(七)识别类的信息

    本文请勿转载。

    展开全文
  • 目录 摘要 1. 简介 2. 指定标签或标记符 3. 标签的位置 4. 引用的生成和更新 5. 引用相关的警告信息 6. 扩展引用功能的宏包 6.1 varioref宏包 6.2 cleveref宏包 ...


    摘要

    交叉引用系统是LaTeX最强大的功能之一——可以在文档的任意地方引用带编号的结构,包括标题、图形、表格、公式等,并且更重要的是可以自动更新引用编号。本文概述了交叉引用系统,并介绍一些相关的扩展宏包。

    1. 简介

    LaTeX 实现了 \label\ref宏,进而可以引用同一文档内的绝大部分带编号对象。首先用\label赋予唯一标识符给某个带编号的对象,然后就可以使用\ref宏进行引用。此外,另一个不那么常用的命令是\pageref,它可以输出被引用对象所在的页码数。

    在下面的例子中,假设文档的第一节标题为“Introduction”,在该节添加了标记符sec:intro,之后就可以在接下来的章节中使用该标记符来指代之前的节:\ref{sec:intro}就会输出数字1。

    \section{Introduction}\label{sec:intro}
    % Content
    \section{Methods}
    In section \ref{sec:intro} on page
    \pageref{sec:intro} we introduced ...

    2. 指定标签或标记符

    引用的标记符并没有什么格式上的限制。不过,通用的做法是使用前缀+冒号作为标记符的开头。添加前缀可以帮助作者识别被引用对象的类型。例如,引用图形的标记符可以是\label{fig:schema}这样的形式。对于一些最常用的引用对象,下表给出了相应的前缀名建议。

    对象前缀对象前缀
    ChapterchFigurefig
    SectionsecTabletab
    SubsectionssecList itemitm
    AppendixappEquationeqn

    当然,使用前缀是个好习惯,不过这也仅仅是建议——作者完全可以使用另一套前缀或者根本不使用。

    3. 标签的位置

    理想情况下,应该在被引用的计数对象之后立即放置\label标记。而对于图形或表格这样的浮动环境,应在\caption宏内或之后放置\label,这是因为图形或表格的编号是由\caption生成的。如果在编号对象之前使用\label基本上都会导致引用错误。当然,在编号对象之后太久才使用\label也可能会有问题。

    4. 引用的生成和更新

    一般来说需要两次运行排版命令才能生成引用编号。背后的原理是这样的,第一次运行排版命令时,系统会收集所有标记符信息,并写入一个辅助文件内;而再次运行排版命令时,系统会读取该辅助文件并进而更新引用。只要系统不能正确生成引用,那么在输出文档的相应位置就会生成双问号??标记。如果再次排版后还不能更新正确的引用,可以看一下log日志文件确定问题。下一节将介绍引用相关的警告信息。

    5. 引用相关的警告信息

    log日志文件中出现与引用相关的警告并不少见。如果标识符的定义有问题,一般会出现两种警告:1. 未定义的引用(undefined references)2. 重复定义的标签(multiply-defined labels)

    第一种情况是由于引用的标识符没有定义,而第二种情况是由于多次定义了同一标识符。例如,如果作者复制一些代码片段(比如图形环境)而忘记重命名其中的标识符,那么就会出现重复定义标签的问题。

    6. 扩展引用功能的宏包

    很多宏包都扩展了LaTeX的交叉引用系统。这里简要介绍以下几个宏包:varioref, cleveref, hyperref, xr/xr-hyper。选择这些宏包的原因是每个宏包都在独特的方面扩展了原有的功能。大体上来说,这些宏包并没有重新定义标准引用命令的功能,而是定义了新命令。因此,标准的\label, \ref\pageref功能并不会受到影响。

    6.1 varioref宏包

    varioref宏包将基础的引用命令扩展为略微更加复杂的形式。其中,\vref命令合并了\ref\pageref的功能,会同时输出被引用对象的编号和所在的页码数。如果被引用对象位于同一页,那么会省略页码数。而\vpageref则增加了\pageref的功能:如果被引用对象位于同一页则输出“on this page”,否则输出所在页码数。最后,该宏包还提供了\vrefrange\vpagerefrange命令,用于多重引用时输出编号也页码范围。

    6.2 cleveref宏包

    很多情况下作者都会在文本中提及所引用对象的类型。例如,某个图形可以引用为“This is shown in figure 1”。而\ref宏只会输出所引用对象的编号,因此这就是cleveref宏包的作用。该宏包会自动识别引用对象的类型,并打印出对应的引用。为此,该宏包提供了\cref命令。类似于varioref宏包,本宏包也实现了输出引用对象所在页码的宏\cpageref和页码范围的宏\cpagerefrange

    6.3 hyperref宏包

    hyperref宏包中涉及交叉引用的主要功能是给引用加上链接:点击该引用会转到被引用对象所在的页面。只要在导言区导入该宏包即可实现相应功能。同时,该宏包还定义了\autoref宏,其功能类似于cleveref宏包中的\cref命令,即同时输出引用编号和引用类型。此外,该宏包实现了大量与交叉引用无关的其它功能,相关讨论已经超出了本文的主题范围。尽管如此,这里仍然推荐浏览一下该宏包的文档以了解该宏包的更多信息。

    6.4 xr/xr-hyper宏包

    xr宏包(eXternal References)可以引用其它文档中的对象。这在科技论文中特别有用,因为经常要在正文之外提供补充材料。使用该宏包就可以在正文中引用补充材料中的图形和表格。要实现此功能,必须在正文的导言区内使用\externaldocument{filename}来指定外部文档。如果外部的tex文件位于不同文件夹下,那么也要指定相应的路径。然后就可以在正文中引用外部文档中的标识符了。另外要注意的是外部文档每次修改引用标签后都要编译一下,这样正文中才能生成正确的编号。

    \documentclass{article}
    \usepackage{xr}
    \externaldocument{supplementary-materials}
    \begin{document}
    See supplementary figure \ref{fig:abc}.
    \end{document}

    对于这样指向外部文档的引用,如果还想要创建超链接,那么使用xr-hyper宏包代替xr

    6.5 宏包载入顺序

    由于这些宏包都会影响引用的行为,因此,载入多个宏包可能造成冲突。所以,宏包载入时需要按照正确的顺序,即:(1) xr/xr-hyper, (2) varioref, (3) hyperref, 最后是 (4) cleveref

    6.6 showlabels 宏包

    最后要提一下showlabels宏包,其功能是在输出的PDF文档的页边处显示标识符。因此,该宏包特别适用于在带有大量标签的长文档中追踪标识符的场合。

    参考文献


    说明:

    转载于:https://www.cnblogs.com/wenbosheng/p/9537774.html

    展开全文
  • 本文是复旦大学发表于 AAAI 2019 的工作,截至目前CASIA-B正确率最高的网络。 英文粘贴原文,google参与...为提高识别的准确度和适应复杂环境的能力,我们将步态视为由独立帧组成的(图像)序列,提出了一个名为Ga...

    本文是复旦大学发表于 AAAI 2019 的工作,截至目前CASIA-B正确率最高的网络。 英文粘贴原文,google参与翻译但人工为主,有不对的地方欢迎评论 ,部分为本人理解添加,非原文内容
    译 | 周悦媛

    步态作为一种独特的生物识别功能,在预防犯罪,法医鉴定和社会保障方面具有广泛的应用。为提高识别的准确度和适应复杂环境的能力,我们将步态视为由独立帧组成的(图像)序列,提出了一个名为GaitSet的新网络来学习(图像)序列中的身份信息。本文将详细介绍其中的探索过程和结论。

    目录

    摘要

    1. 介绍

    Flexible 灵活性

    Fast 快速性

    Effective 有效性

    2. 相关工作

    2.1 步态识别

    2.2 无序序列的深度学习

    3. GaitSet

    3.1 问题表述

    3.2 Set Pooling

    Statistical Functions 统计函数

    Joint Function 联合函数

    Attention 注意力机制

    3.3 Horizontal Pyramid Mapping

    3.4 Multilayer Global Pipeline

    3.5 训练和测试

    训练损失函数

    测试

    4. 实验

    4.1 数据集和训练细节

    CASIA-B

    OU-MVLP

    训练细节

    4.2 主要结果

    CASIA-B

    Small-Sample Training (ST)

    Medium-Sample Training (MT)

    & Large-Sample Training (LT)

    OU-MVLP

    4.3 AblationExperiments 消融实验

    Set VS. GEI

    Impact of SP

    Impact of HPM and MGP

    4.4 Practicality 实用性

    Limited Silhouettes 有限轮廓数量

    MultipleViews 多视角

    Multiple Walking Conditions

    5. 结论

    「摘要」

    As a unique biometric feature that can be recognized at a distance, gait has broad applications in crime prevention, forensic identification and social security.

    作为一种可以远距离识别的独特生物识别功能,步态在预防犯罪,法医鉴定和社会保障方面具有广泛的应用。

    To portray a gait, existing gait recognition methods utilize either a gait template, where temporal information is hard to preserve, or a gait sequence, which must keep unnecessary sequential constraints and thus loses the flexibility of gait recognition.

    为了描绘步态,现有的步态识别方法利用步态模板(其中时间信息难以保存)或步态序列,其必须保持不必要的顺序约束并因此失去步态识别的灵活性。

    In this paper we present a novel perspective, where a gait is regarded as a set consisting of independent frames. We propose a new network named GaitSet to learn identity information from the set.

    在本文中,我们提出了一种新颖的视角,其中步态被视为由独立帧组成的(图像)序列。我们提出了一个名为GaitSet的新网络来学习(图像)序列中的身份信息。

    Based on the set perspective, our method is immune to permutation of frames, and can naturally integrate frames from different videos which have been filmed under different scenarios, such as diverse viewing angles, different clothes/carrying conditions.

    基于(图像)序列视角,我们的方法不受帧的排列的影响,并且可以自然地整合来自不同视频的帧,这些视频已经在不同的场景下被完成,例如不同的视角,不同的衣服/携带条件。

    Experiments show that under normal walking conditions, our single-model method achieves an average rank-1 accuracy of 95.0% on the CASIAB gait dataset and an 87.1% accuracy on the OU-MVLP gait dataset.

    实验表明,在正常步行条件下,我们的单模型方法在CASIAB步态数据集上实现了平均95.0%的一次命中准确度,在OU-MVLP步态数据集上达到了87.1%的准确度。

    These results represent new state-of-the-art recognition accuracy.

    这些结果代表了新的最先进的识别准确度。

    On various complex scenarios, our model exhibits a significant level of robustness. It achieves accuracies of 87.2% and 70.4% on CASIA-B underbag-carrying and coat-wearing walking conditions, respectively.

    在各种复杂场景中,我们的模型具有显着的鲁棒性。它分别对携带CARA-Bunderbag和涂层的行走条件达到了87.2%和70.4%的准确率。

    These outperform the existing best methods by a large margin.

    这些都大大优于现有的最佳方法。

    The method presented can also achieve a satisfactory accuracy with a small number of frames in a test sample, e.g., 82.5% on CASIAB with only 7 frames.

    所提出的方法可以在小帧数测试样本中获得令人满意的正确率,例如在CASIAB上仅用7帧得到82.5%的正确率。

    「1」介绍

    Unlike other biometrics such as face, fingerprint and iris, gait is a unique biometric feature that can be recognized at a distance without the cooperation of subjects and intrusion to them.Therefore,it has broad applications in crime prevention, forensic identification and social security.

    与脸部,指纹和虹膜等其他生物识别技术不同,步态是一种独特的生物特征,可以远距离识别,非侵入且无需受试者的合作。因此,它被广泛应用于犯罪防范、法医鉴定和社会保障。

    However, gait recognition suffers from exterior factors such as the subject’s walking speed, dressing and carrying condition, and the camera’s viewpoint and frame rate.

    然而,步态识别受到外部因素的影响,例如受试者的步行速度,穿着和携带状况,以及相机的视点和帧速率。

    There are two main ways to identify gait in literature,i.e.,regarding gait as an image and regarding gait as a video sequence. The first category compresses all gait silhouettes into one image, or gait template for gait recognition.

    在文献中识别步态有两种主要方式,即将步态视为图像和将步态视为视频序列。第一类将所有步态轮廓压缩成一个图像,或用步态模板进行步态识别。"第一类典型代表典型代表GEI,如下图最后一列就是前几列图像的GEI,Gait Energy Image"

    Simple and easy to implement, gait template easily loses temporal and fine-grained spatial information. Differently, the second category extracts features directly from the original gait silhouette sequences in recent years.

    步态模板简单易行,但很容易丢失时间和细粒度的空间信息。不同的是,近几年第二类直接从原始步态轮廓序列中提取特征的算法更多。

    However, these methods are vulnerable to exterior factors. Further, deep neural networks like3D-CNN for extracting sequential information are harder to train than those using a single template like Gait Energy Image.

    但是,这些方法容易受到外部因素的影响。此外,用于提取序列信息的深度神经网络如 3D-CNN 比使用像 GEI 这样的单个模板的深度神经网络更难训练。

    To solve these problems, we present a novel perspective which regards gait as a set of gait silhouettes. As a periodic motion, gait can be represented by a single period.

    为了解决这些问题,我们提出了一种新思路即将步态特征视为一组步态轮廓图。作为周期性运动,步态可以由一个周期表示。

    In a silhouette sequence containing one gait period, it was observed that the silhouette in each position has unique appearance, as shown in Fig. 1.

    在包含一个步态周期的轮廓序列中,观察到每个位置的轮廓具有独特的外观,如图1所示。

    图1:从左上角到右下角是CASIA-B步态数据集中的一个目标的完整周期轮廓。

    Even if these silhouettes are shuffled, it is not difficult to rearrange them into correct order only by observing the appearance of them. Thus, we assume the appearance of a silhouette has contained its position information. With this assumption, order information of gait sequence is not necessary and we can directly regard gait as a set to extract temporal information.

    即使这些轮廓是乱序的,但只有通过观察它们的外观就能将它们重新排列成正确的顺序。因此,我们假设轮廓的外观包含其位置信息。通过这种假设,步态序列的顺序信息不是必需的(输入特征),我们可以直接将步态视为一组(图像)来提取时间信息。

    We propose an end-to-end deep learning model called GaitSet whose scheme is shown in Fig. 2.

    我们提出了一种端到端的深度学校模型称作GaitSet,其框架图见图2。

    图2:GaitSet的框架。'SP'代表Set Pooling。梯形表示卷积和池化块,同一列中的梯形具有相同的参数,这些参数由带有大写字母的矩形表示。请注意,尽管MGP中的块与主流水线中的块具有相同的参数,但其参数仅在主流水线中的块之间共享,而不与MGP中的块共享。HPP代表水平金字塔池化。

    The input of our model is a set of gait silhouettes.

    我们这个模型的输入是一组步态轮廓图像。(就像图1那种)

    First, a CNN is used to extract frame-level features from each silhouette independently. Second, an operation called Set Pooling is used to aggregate frame-level features into a single set-level feature.

    首先,CNN用于独立地从每个轮廓中提取帧级特征。其次,名为Set Pooling的操作用于将帧级特征聚合成独立序列级特征。

    Since this operation is applied on high-level feature maps instead of the original silhouettes, it can preserve spatialand temporal information better than gait template.This will be justified by the experiment in Sec. 4.3.

    由于此操作应用于高级特征(原始轮廓卷积之后就变成高级特征了)而不是原始轮廓,因此它可以比步态模板更好地保留空间和时间信息。

    (其实我感觉这句话说的有点不太好理解,也可能是我理解能力有限,作者应该想表达的是:整个过程提取了每一帧图像的空间特征同时还提取了整个序列的时间特征,比步态模板的方式提取的特征更全面,侧重点应该在保留时间特征的同时提取了各帧特征)这部分的实验验证在Sec.4.3中详细介绍。

    Third, a structure called Horizontal Pyramid Mapping is used to map the set-level feature into a more discriminative space to obtain the final representation.

    第三,使用称为水平金字塔映射(Horizontal Pyramid Mapping,HPM)的结构将序列级特征映射到更具辨别力的空间以获得最终表示。

    (这句话的后半句说的很玄乎啊,主要discriminative这个词用的太好了,让人不明觉厉。我的理解就是把这个序列级特征,就是包含了时间和空间的特征压缩成一维特征便于最后全连接做分类。)

    The superiorities of the proposed method are summarized as follows:

    该方法的优越性总结如下:

    Flexible

    Our model is pretty flexible since there are no any constraints on the input of our model except the size of the silhouette. It means that the input set can contain any number of non-consecutive silhouettes filmed under different viewpoints with different walking conditions. Related experiments are shown in Sec. 4.4

    灵活性

    我们的模型非常灵活,因为除了轮廓的大小之外,我们模型的输入没有任何限制。这意味着输入的序列可以包含在不同视点下具有不同行走条件的任意数量的非连续轮廓。相关实验见Sec.4.4。(此处原文忘记写句号了我帮他们填上了哈哈哈)

    Fast

    Our model directly learns the representation of gait instead of measuring the similarity between a pair of gait templates or sequences. Thus, the representation of each sample needs to be calculated only once, then the recognition can be completed by calculating the Euclidean distance between representations of different samples.

    快速性

    我们的模型直接学习步态的表示,而不是测量一对步态模板或序列之间的相似性。因此,每个样本的表示仅需要计算一次,然后可以通过计算不同样本的表示之间的欧式距离来完成识别。

    Effective

    Our model greatly improves the performance on the CASIA-B and the OUMVLP datasets, showing its strong robustness to view and walking condition variations and high generalization ability to large datasets.

    有效性

    我们的模型极大地提高了CASIA-B和OUMVLP数据集的性能,显示了其对视图和行走条件变化的强大鲁棒性以及对大型数据集的高泛化能力。

    「2」相关工作

    In this section, we will give a brief survey on gait recognition and set-based deep learning methods.

    这部分我们会简要介绍步态识别和基于序列的深度学习方法的回顾。

    2.1 步态识别

    Gait recognition can be grouped into template-based and sequence-based categories.

    步态识别可以分为基于模板和基于序列两种。

    Approaches in the former category first obtain human silhouettes of each frame by background subtraction.

    Second, they generate a gait template by rendering pixel level operators on the aligned silhouettes.

    Third, they extract the representation of the gait by machine learning approaches such as Canonical Correlation Analysis(CCA), Linear Discriminant Analysis (LDA) and deep learning. Fourth, they measure the similarity between pairs of representations by Euclidean distance or some metric learning approaches.

    Finally, they assign a label to the template by some classifier, e.g., nearest neighbor classifier.

    前一类中的方法首先通过背景减法获得每个帧的人体轮廓。第二步,将排列好的轮廓在帧级进行操作以生成步态模板。第三步,他们通过机器学习方法提取步态的表示,例如典型相关分析(CCA),线性判别分析(LDA)和深度学习。第四,它们通过欧几里德距离或一些度量学习方法来测量表示对(表示对就是输入的图像序列和训练过程中已经存储的一组图像序列)之间的相似性。最后,他们通过某些分类器,例如,最近邻居分类器,来为(输入的待检测)模板分配标签。

    Previous works generally divides this pipeline into two parts, template generation and matching.

    以前的工作通常将此流程分为两部分,模板生成和匹配。

    The goal of generation is to compress gait information into a single image, e.g., Gait Energy Image (GEI) and Chrono-Gait Image (CGI).

    (模板)生成的目标是将步态信息压缩成单个图像,例如步态能量图像(GEI)和计时步态图像(CGI)。

    In template matching approaches, View Transformation Model (VTM) learns a projection between different views. (Hu et al. 2013) proposed View-invariant Discriminative Projection (ViDP) to project the templates into a latent space to learn a view-invariance representation.

    在模板匹配方法中,视角转换模型(VTM)学习不同视图之间的投影。(Hu et al.2013)提出了视角不变判别投影(ViDP)将模板投影到潜在空间以学习视角不变性表示。

    (关于潜在空间 latent space参考https://www.quora.com/What-is-the-meaning-of-latent-space,其实就是一个说不定几维的空间,这个空间中同一类的物体离的更近,以便于分类。上述链接可能打不开,内容见下图)

    Recently, as deep learning performs well on various generation tasks, it has been employed on gait recognition task (Yu et al. 2017a; He et al. 2019; Takemura et al. 2018a; Shiraga et al. 2016; Yu et al. 2017b; Wu et al. 2017).

    最近,由于深度学习在各种生成任务上表现良好,因此它已被(广泛)用于步态识别任务(列举了一堆相关文献)。

    As the second category, video-based approaches directly take a sequence of silhouettes as input. Based on the way of extracting temporal information, they can be classified into LSTM-based approaches (Liao et al. 2017) and 3D CNN-based approaches (Wolf, Babaee, and Rigoll 2016; Wu et al. 2017).

    作为第二类,基于视频的方法直接采用一系列轮廓作为输入。基于提取时间信息的方式,可以将它们分类为基于LSTM的方法和基于3D CNN的方法。

    The advantages of these approaches are that 1) focusing on each silhouette, they can obtain more comprehensive spatial information.2)They can gather more temporal information because specialized structures are utilized to extract sequential information. However, The price to pay for these advantages is high computational cost.

    这些方法的优点在于:1)关注每个轮廓以获得更全面的空间信息.2)可以收集更多的时间信息,因为利用了专门的结构来提取顺序信息。然而,为这些优势付出的代价是高计算成本。

    2.2 无序序列的深度学习

    Most works in deep learning focus on regular input representations like sequence and images. The concept of unordered set is first introduced into computer vision by (Charles et al. 2017) (PointNet) to tackle point cloud tasks. Using unordered set, PointNet can avoid the noise and the extension of data caused by quantization, and obtain a high performance. Since then, set-based methods have been wildly used in point cloud field(Wangetal.2018c;ZhouandTuzel2018; Qi et al. 2017).

    大多数深度学习工作都致力于常规输入表示,如序列和图像。无序集的概念首先被(Charles et al.2017)(PointNet)引入到计算机视觉中,以解决点云任务。PointNet使用无序序列,可以避免由量化引起的噪声和数据扩展,并获得更好的性能。于是,基于序列的方法被广泛用于点云领域(列举相关文献)。

    Recently, such methods are introduced into computer vision domains like content recommendation (Hamilton, Ying, and Leskovec 2017) and image captioning (Krause et al. 2017) to aggregate features in a form of a set. (Zaheer et al. 2017) further formalized the deep learning tasks defined on sets and characterizes the permutation invariant functions. To the best of our knowledge, it has not been employed in gait recognition domain up to now.

    最近,这些方法被引入计算机视觉领域,如内容推荐和图像字幕,用于聚合一个序列的特征。Zaheer等人进一步给出了深度学习任务中的序列描述和排列不变函数。据我们所知,它至今尚未被用于步态识别领域。

    「3」GaitSet

    In this section, we describe our method for learning discriminative information from a set of gait silhouettes. The overall pipeline is illustrated in Fig. 2.

    在本节中,我们将介绍从一组步态轮廓中学习判别信息的方法。整个流程如图2所示。

    3.1 问题表述

    We begin with formulating our concept of regarding gait as a set.

    首先,将步态视为一组序列。

    Given a dataset of N people with identities yi,i ∈ 1,2,...,N, we assume the gait silhouettes of a certain person subject to a distribution Pi which is only related to its identity.

    给定一个数据集,数据集中一共N个人,每个人用yi表示(共有y1,y2,...yN这么多个表示)。假设某个人的步态轮廓分布Pi只与这个人的ID有关(就是说一个人的轮廓和这个人是一一对应的,不会搞错,其实就是步态识别的可行基础,即每个人的步态独具特色)。

    Therefore, all silhouettes in one or more sequences of a person can be regarded as a set of n silhouettes Xi = {x(ij) | j = 1,2,...,n}, where x(ij) ∼Pi. (为了方便打字,本文用x(ij) 代表)

    因此,在一个或多个序列中,所有的轮廓可以被看做是Xi = {x(ij) | j = 1,2,...,n}, 其中 x(ij) ∼Pi。

    插入一段解释或者说是总结(以CASIC-B数据集为例):

    数据集中有N=124个人,每个人用yi表示,比如我没记错的话ID=109的那个人的视频好多连人都没出现视频就结束了,那么在这个论文中就说y109视频不全。

    在全部数据集中闭着眼睛任选出来一个轮廓怎么表示那?假如选到的轮廓图所在序列一共有20帧,选的的轮廓图是序列中的第3帧,那么表示方法就是x(20 3),其所在序列表示为X20。

    Under this assumption, we tackle the gait recognition task through 3 steps, formulated as:

    在这个假设下,我们通过3个步骤解决步态识别任务,表述为:

    where F is a convolutional network aims to extract framelevel features from each gait silhouette.

    其中F是卷积网络,旨在从每个步态轮廓中提取帧级特征。

    The function G is a permutation invariant function used to map a set of framelevel feature to a set-level feature (Zaheer et al. 2017). It is implemented by an operation called Set Pooling (SP) which will be introduced in Sec. 3.2.

    函数G是用于将一组帧级特征映射到序列级特征的排列不变函数。该函数通过Set Pooling(SP)实现,详细信息在Sec.3.2中介绍。

    The function H is used to learn the discriminative representation of Pi from the set-level feature. This function is implemented by a structure called Horizontal Pyramid Mapping (HMP) which will be discussed in Sec. 3.3.

    函数H用于从序列级特征中学习Pi的辨别表示。(就是对序列级特征进行分类,对应到每个人身上)这个函数是通过一个叫做Horizontal Pyramid Mapping(HPM此处原文应该是打错了)的结构实现的,将在Sec.3.3中介绍。

    The input Xi is a tensor with four dimensions, i.e. set dimension, image channel dimension, image hight dimension, and image width dimension.

    输入Xi是具有四个维度的tensor,分别是序列维度,图像通道维度,图像高度和图像宽度维度。tensor.shape=(n帧,2通道,64,64)

    3.2 Set Pooling

    The goal of Set Pooling (SP) is to aggregate gait information of elements in a set, formulated as z = G(V ), where z denotes the set-level feature and V = {vj|j = 1,2,...,n} denotes the frame-level features. (vj表示)

    Set Pooling(SP)的目的在于收集一下整个序列的步态信息,公式化表示成z = G(V ),其中z表示序列级特征, V = {vj|j = 1,2,...,n}表示帧级特征。

    There are two constraints in this operation.

    此处有两个约束条件。

    First, to take set as an input, it should be a permutation invariant function which is formulated as:

    第一,将序列作为输入,它应该是一个排列不变函数,其表达式为:

    其中π为任意排列组合。

    Second, since in real-life scenario the number of a person’s gait silhouettes can be arbitrary, the function G should be able to take a set with arbitrary cardinality.

    第二,因为现实生活场景中,一个人的步态轮廓数可是是任意的,函数G应该可以输入任意基数的序列。(就是这个序列可长可短,多少帧都行,这是GaitSet宣传的一大优势)

    Next, we describe several instantiations of G. It will be shown in the experiments that although different instantiations of SP do have sort of influence on the performances, they do not differ greatly and all of them exceed GEI-based methods by a large margin.

    下面,我们介绍了函数G的几个实例。在实验中将显示尽管SP的不同实例确实对性能有影响,但它们没有很大差异并且它们都大大超过基于GEI的方法。

    Statistical Functions 统计函数

    To meet the requirement of invariant constraint in Equ. 2, a natural choice of SP is to apply statistical functions on the set dimension. Considering the representativeness and the computational cost, we studied three statistical functions: max(·), mean(·) and median(·). The comparison will be shown in Sec. 4.3.

    在满足Equ. 2中不变约束的要求下,SP一个很自然的选取是在序列维度上应用统计函数。考虑到典型性和计算成本,研究了三个统计函数:max(·),mean(·)和median(·)。比较将在Sec.4.3中展示。

    Joint Function 联合函数

    We also studied two ways to join 3 statistical functions mentioned above:

    我们也研究了两种上述3个统计函数共同作用的情况:

    其中,cat表示在通道维度连接,1_1C表示1×1卷积层,max、mean、median都是应用在序列维度。Equ.4 是Equ.3的增强版,多出来的1×1卷积层可以学习合适的权重以组合不同统计函数提取的信息。

    Attention 注意力机制

    这部分原文大量使用了refine这个词,我大概有个理解,但是没想好这个词怎么翻译才合理。

    Since visual attention was successfully applied in lots of tasks, we use it to improve the performance of SP.

    由于视觉注意力已成功应用于大量任务中,因此我们使用它来提高SP的性能。

    Its structure is shown in Fig. 3. The main idea is to utilize the global information to learn an element-wise attention map for each frame-level feature map to refine it.

    其结构如图3所示。主要思想是利用全局信息来学习每个帧级特征图的元素注意力图,以便提炼更有价值信息。

    图3 Set Pooling(SP)应用注意力机制的结构。1_1C和cat分别代表1×1卷积层和连接。乘法和加法都是逐点的。

    Global information is first collected by the statistical functions in the left. Then it is fed into a 1×1 convolutional layer along with the original feature map to calculate an attention for the refinement. The final set-level feature z will be extracted by employing MAX on the set of the refined frame-level feature maps. The residual structure can accelerate and stabilize the convergence.

    首先由左侧(上面)的统计函数收集全局信息。然后,将其与原始特征图一起送入1×1卷积层计算注意力以精炼特征信息。通过在所设置的帧级特征映射的集合上使用MAX来提取最终的设置级特征z。最终的序列级特征z将被MAX应用在序列维度。残余结构可以加速并稳定收敛。

    3.3 Horizontal Pyramid Mapping

    In literature, splitting feature map into strips is commonly used in person re-identification task.The images are cropped and resized into uniform size according to pedestrian size whereas the discriminative parts vary from image to image.

    在文献中,将特征图分割成条的方式经常用于人的重新识别任务。根据行人大小裁剪图像并将其尺寸调整为均匀尺寸,但辨别部分仍然因图像而异。

    (Fu et al. 2018) proposed Horizontal Pyramid Pooling (HPP) to deal with it. HPP has 4 scales and thus can help the deep network focus on features with different sizes togather both local and global information. We improve HPP to make it adapt better for gait recognition task.

    (Fu et al.2018)提出了Horizontal Pyramid Pooling(HPP) 来处理上述问题。HPP有4个等级,因此可以帮助深度网络同时提取局部和全局特征。我们改进了HPP使其更适合步态识别任务。

    Instead of applying a 1×1 convolutional layer after the pooling, we use independent fully connect layers (FC) for each pooled feature to map it into the discriminative space, as shown in Fig. 4. We call it Horizontal Pyramid Mapping (HPM).

    如图4所示,我们对每个池化后的特征使用独立的完全连接层(FC)将其映射到判别空间,而不是在合并后应用1×1卷积层。我们称这样的操作为Horizontal Pyramid Mapping (HPM)。

    图4 HPM结构图

    Specifically, HPM has S scales. On scale s ∈ 1,2,...,S, the feature map extracted by SP is split into strips on height dimension, i.e. strips in total.

    具体而言,HPM具有S个尺度。再尺度s ∈ 1,2,...,S上,由SP提取的特征图在高度尺寸上被分成条,即总共条。

    (举个例子,假如S=3,则一个人的特征在竖直方向上如下图被分割成3种尺度,=4条,所有尺度的条加在一起一共是1+2+4=7=)

    Then a Global Pooling is applied to the 3-D strips to get 1-D features. For a strip zs,t where t ∈ 1,2,..., stands index of the strip in the scale, the Global Pooling is formulated as f's,t = maxpool(zs,t) + avgpool(zs,t), where maxpool and avgpool denote Global Max Pooling and Global Average Pooling respectively. Note that the functions maxpool and avgpool are used at the same time because it outperforms applying anyone of them alone.

    然后,用一个全局池化将3维条变成1维特征。对于一个条zs,t来说,t ∈ 1,2,...,代表尺度s种条的角标,全局池化的公式是 f's,t = maxpool(zs,t) + avgpool(zs,t),其中maxpool和avgpool分别代表全局最大池化和全局平均池化。注:同时使用maxpool和avgpool是因为同时使用比只使用其中一种效果要好。

    The final step is to employ FCs to map the features f‘ into a discriminative space. Since strips in different scales depict features of different receptive fields, and different strips in each scales depict features of different spatial positions, it comes naturally to use independent FCs, as shown in Fig. 4.

    最后一步是使用FC(全连接)将特征f'映射到辨别空间。因为不用的条在不同的尺度中描述不同的感受野,并且不同的条在每个尺度中秒速不同空间位置的特征,因此如图4,很自然会想到用独立的FC。

    3.4 Multilayer Global Pipeline

    Different layers of a convolutional network have different receptive fields. The deeper the layer is, the larger the receptive field will be. Thus, pixels in feature maps of a shallow layer focus on local and fine-grained information while those in a deeper layer focus on more global and coarse-grained information.

    不同层的卷积网络具有不同的感受野。越深层具有越大的感受野。因此,浅层特征更注重细粒度,而深层特征蕴含更多全局粗粒度信息。

    The set-level features extracted by applying SP on different layers have analogical property. As shown in the main pipeline of Fig. 2, there is only one SP on the last layer of the convolutional network. To collect various-level set information, Multilayer Global Pipeline (MGP) is proposed. It has a similar structure with the convolutional network in the main pipeline and the set-level features extracted in different layers are added to MGP.

    SP提取的序列级特征在不同层有相似的属性。如图2所示的主流程,在卷积网络的最后只有一个SP。为了收集不同级别的序列信息而提出Multilayer Global Pipeline (MGP)。

    The final feature map generated by MGP will also be mapped into features by HPM. Note that the HPM after MGP does not share parameters with the HPM after the main pipeline.

    最终由MGP生成的特征也被HPM分成条特征。注意:在MGP后面的HPM不会和主流程后面的HPM共享参数。

    3.5 训练和测试

    训练损失函数

    As aforementioned, the output of the network is features with dimension d. The corresponding features among different samples will be used to compute the loss.

    如上所述,网络的输出是具有d个维度的个特征。不同样本对应的特征将被用于计算损失。

    In this paper, Batch All (BA+) triplet loss is employed to train the network (Hermans, Beyer, and Leibe 2017).

    本文中,训练网络使用Batch All(BA+)三元损失。(BA+三元损失在文章《In Defense of the Triplet Loss for Person Re-Identification》中的Sec.2的第6段介绍。)

    A batch with size of p×k is sampled from the training set where p denotes the number of persons and k denotes the number of training samples each person has in the batch.

    从训练集中拿出一个大小是p*k的batch,其中p是人数,k是每个人拿k张图。

    Note that although the experiment shows that our model performs well when it is fed with the set composed by silhouettes gathered from arbitrary sequences, a sample used for training is actually composed by silhouettes sampled in one sequence.

    注:虽然我们的模型在输入任意序列中的轮廓测试时表现良好,但是训练的时候其实是用一个序列中的轮廓训练的。

    (我理解的这句话意思是:测试阶段,可以混合输入一个人任意序列中的某些轮廓,但是训练时,是每个人每次只输入一个序列中的某些轮廓)

    测试

    Given a query Q, the goal is to retrieve all the sets with the same identity in gallery set G. Denote the sample in G as g. The Q is first put into GaitSet net to generate multiscale features, followed by concatenating all these features into a final representations Fq as shown in Fig. 2. The same process is applied on each g to get Fg. Finally,Fq is compared with every Fg using Euclidean distance to calculate Rank 1 recognition accuracy.

    给定一个待验证序列Q,目标是在图片序列G中遍历全部序列找到与给定相同的ID。设G中的样本为g。首先将Q输入到GaitSet网络中生成多尺度特征,然后将这些特征连接起来形成最终的表示Fq,如图2所示。每一个样本g都走一遍一样的流程,即输入Gait Set网络并连起来,生成Fg。最终,Fq与每一个Fg计算欧式距离来判断一次命中的识别正确率。

    「4」实验

    Our empirical experiments mainly contain three parts. The first part compares GaitSet with other state-of-the-art methods on two public gait datasets: CASIA-B (Yu, Tan, and Tan 2006) and OU-MVLP (Takemura et al. 2018b). The Second part is ablation experiments conducted on CASIA-B. In the third part, we investigated the practicality of GaitSet in three aspects: the performance on limited silhouettes, multiple views and multiple walking conditions.

    我们的实验注意包含3个部分。第一部分是比较GaitSet和其他顶级算法在2个公开数据集CASIA-B和OU-MVLP上的效果。第二部分是对CASIA-B进行的消融实验(类似控制变量)。第三部分从三个方面研究了GaitSet的实用性:有限轮廓下的性能表现,多视图和多步行条件下的性能。

    4.1 数据集和训练细节

    CASIA-B

    CASIA-B dataset (Yu, Tan, and Tan 2006) is a popular gait dataset. It contains 124 subjects (labeled in 001-124), 3 walking conditions and 11 views (0◦,18◦,...,180◦). The walking condition contains normal (NM) (6 sequences per subject), walking with bag (BG) (2 sequences per subject) and wearing coat or jacket (CL) (2 sequences per subject). Namely,eachsubjecthas 11×(6+2+2) = 110 sequences.

    CASIA-B 数据集是一个流行的步态数据集。内含124个对象(标记为001-124号),3种走路状态和11个角度(0°,18°,...,180°)。行走状态包含正常(NM)(每人6组)、背包(GB)(每人2组)、穿外套或夹克衫(CL)(每人2组)。也就是,每个人有 11×(6+2+2) = 110 个序列。

    As there is no official partition of training and test sets of this dataset, we conduct experiments on three settings which are popular in current literatures. We name these three settings as small-sample training (ST), medium-sample training (MT) and large-sample training (LT). In ST, the first 24 subjects (labeled in 001-024) are used for training and the rest 100 subjects are leaved for test. In MT, the first 62 subjects are used for training and the rest 62 subjects are leaved for test. In LT, the first 74 subjects are used for training and the rest 50 subjects are leaved for test.

    因为数据集不存在官方规定的训练和测试部分,我们用当前文献中流行的3种分配方法进行实验。我们将这3种分配方法取名为小样本训练(ST)、中样本训练(MT)、大样本训练(LT)。ST是前24人用作训练集,余下的后100人作为验证集。MT是前62人用作训练集,余下的后62人作为验证集。LT是前74人作为训练集,余下的后50人作为验证集。

    In the test sets of all three settings, the first 4 sequences of the NM condition(NM #1-4) are kept in gallery, and the rest 6 sequences are divided into 3 probe subsets, i.e. NM subsets containing NM #5-6, BG subsets containing BG #1-2 and CL subsets containing CL #1-2.

    在所有三种设置的测试集中,NM条件的前4个序列(NM#1-4)保持在图库中,其余6个序列被分成3个探针子序列,即包含NM#5-6的NM,BG#1-2,CL#1-2的CL子序列。

    OU-MVLP

    OU-MVLP dataset (Takemura et al. 2018b) is so far the world’s largest public gait dataset. It contains 10,307 subjects,14views(0◦,15◦,...,90◦;180◦,195◦,...,270◦)per subject and 2 sequences (#00-01) per view. The sequences are divided into training and test set by subjects (5153 subjects for training and 5154 subjects for test). In the test set, sequences with index #01 are kept in gallery and those with index #00 are used as probes.

    OU-MVLP 数据集是迄今为止世界最大的公开步态数据集。其中包含10307个人, 每人14个角度,每个角度2个序列。全部序列被分成训练集和验证集(训练集包含5153个人,测试集包含5154个人)。测试集中,#01号序列被归为图库,#00号序列被用作探针。

    训练细节

    In all the experiments, the input is a set of aligned silhouettes in size of 64 × 44. The silhouettes are directly provided by the datasets and are aligned based on methods in (Takemura et al. 2018b). The set cardinality in the training is set to be 30. Adam is chosen as an optimizer (Kingma and Ba 2015). The number of scales S in HPM is set as 5. The margin in BA+ triplet loss is set as 0.2. The models are trained with 8 NVIDIA 1080TI GPUs. 1)

    在所有的实验中,输入都是一系列64×44的对齐轮廓。轮廓有数据集直接提供并且对齐是基于Takemure的方法。训练集使用每个人每个序列的30张图片。优化器是Adam优化器。HPM种的尺度S=5。三元损失BA+的margin设置为0.2。用8个NVIDIA 1080TI GPU训练的模型。

    1) In CASIA-B, the mini-batch is composed by the manner introduced in Sec. 3.5 with p = 8 and k = 16. We set the number of channels in C1 and C2 as 32, in C3 and C4 as 64 and in C5 and C6 as 128. Under this setting, the average computational complexity of our model is 8.6GFLOPs. The learning rate is set to be 1e − 4. For ST, we train our model for 50K iterations. For MT, we train it for 60K iterations. For LT, we train it for 80K iterations.

    1)CASIA-B中,mini-batch由前面Sec.3.5中介绍的p=8,k=16两部分组成。将C1和C2的通道数设置成32,C3和C4的通道数设置成64,C5和C6的通道数为128。按照这种设置,我们模型的平均计算复杂度是8.6GFLOPs。学习率是1e-4。ST训练时,模型训练50000轮,MT 60000轮,LT80000轮。

    2)In OU-MVLP, since it contains 20 times more sequences than CASIA-B, we use convolutional layers with more channels (C1 = C2 = 64,C3 = C4 = 128,C5 = C6 = 256) and train it with larger batch size (p = 32,k = 16). The learning rate is 1e−4 in the first 150K iterations, and then is changed into 1e−5 for the rest of 100K iterations.

    2)OU-MVLP中比CASIA-B多了20倍的序列,因此我们使用更深的卷积层(C1 = C2 = 64,C3 = C4 = 128,C5 = C6 = 256)并且训练的batch size更大(p = 32,k = 16)。前150000轮学习率是1e-4,后100000轮学习率衰减到1e-5。

    4.2 主要结果

    CASIA-B

    Tab. 1 shows the comparison between the stateof-the-art methods 1 and our GaitSet. Except of ours, other results are directly taken from their original papers. All the results are averaged on the 11 gallery views and the identical views are excluded. For example, the accuracy of probe view 36◦ is averaged on 10 gallery views, excluding gallery view 36◦

    Tab.1展示了Gait Set与顶级算法之间的比较。出来GaitSet,其他数据是直接从各自的文章中引用的。所有结果均在11个视角中取平均值,并且不包括相同的视角。例如,视角36°探针的正确率是平均了除36°以外的10个视角。

    An interesting pattern between views and accuracies can be observed in Tab. 1. Besides 0◦ and 180◦ , the accuracy of 90◦ is a local minimum value. It is always worse than that of 72◦ or 108◦.

    从表1中可以看出视角和正确率有一种有趣的关系。除0°和180°外,90°的精度是局部最小值。90°时总是比72°或108°更差。

    The possible reason is that gait information contains not only those parallel to the walking direction like stride which can be observed most clearly at 90◦, but also those vertical to the walking direction like a left-right swinging of body or arms which can be observed most clearly at 0◦ or 180◦. So, both parallel and vertical perspectives lose some part of gait information while views like 36◦ or 144◦ can obtain most of it.

    可能的原因是,步态信息不仅包含与步行方向平行的步幅信息,例如在90°时可以最清楚地观察到的步幅,还包括与行走方向垂直的步态信息,如可以观察到的身体或手臂的左右摆动最明显的是0°或180°。因此,平行视角(90°)和垂直视角(0°&180°)都会丢失部分步态信息,而像36°或144°这样的视图可以获得大部分信息。

    Small-Sample Training (ST)

    Our method achieves a high performance even with only 24 subjects in the training set and exceed the best performance reported so far (Wuetal. 2017) over 10 percent on the views they reported. There are mainly two reasons.

    我们的方法在仅有24个目标的训练集依然能够获得迄今为止所有算法中最佳效果切超过此前最好值10%有如下两个主要原因:

    1) As our model regards the input as a set, images used to train the convolution network in the main pipeline are dozens of times more than those models based on gait templates. Taking a mini-batch for an example, our model is fed with 30×128 = 3840 silhouettes while under the same batch size models using gait templates can only get 128 templates.

    1)由于我们的模型将输入视为一组图像,用于训练主流水线中的卷积网络的图像比基于步态模板的模型多几十倍。拿一个mini-batch举例子,我们的模型输入30×128=3840个轮廓而同样batch size的步态模板类模型只能获得128个模板。

    2)Since the sample sets used in training phase are composed by frames selected randomly from the sequence, each sequence in the training set can generate multiple different sets. Thus any units related to set feature learning like MGP and HPM can also be trained well.

    2)由于训练阶段中使用的样本序列由从序列中随机选择的帧组成,因此训练集中的每个序列可以生成多个不同的集合。因此,与MGP和HPM等序列特征学习相关的任何单元也可以很好地得到训练。

    Medium-Sample Training (MT) & Large-Sample Training (LT)

    Tab. 1 shows that our model obtains very nice results on the NM subset, especially on LT where results of all views except 180◦ are over 90%. On the BG and CL subsets, although the accuracies of some views like 0◦ and 180◦ are still not high, the mean accuracies of our model exceed those of other models for at least 18.8%.

    Tab.1显示我们的模型在NM子序列上获得了非常好的结果,特别是在LT上,除180°以外的所有视图的结果都超过90%。在BG和CL子序列上,虽然0°和180°等一些视图的准确度仍然不高,但我们模型的平均精度超过其他模型的平均精度至少18.8%。

    OU-MVLP

    Tab. 3 shows our results. As some of the previous works did not conduct experiments on all 14 views, we list our results on two kinds of gallery sets, i.e. all 14 views and 4 typical views (0◦,30◦60◦90◦). All the results are averaged on the gallery views and the identical views are excluded. The results show that our methods can generalize well on the dataset with such a large scale and wide view variation. Further, since representation for each sample only needs to be calculated once, our model can complete the test (containing 133780 sequences) in only 7 minutes with 8 NVIDIA 1080TI GPUs. It is note worthy that since some subjects miss several gait sequences and we did not remove them from the probe, the maximum of rank-1 accuracy cannot reach 100%. If we ignore the cases which have no corresponding samples in the gallery, the average rank-1 accuracy of all probe views is 93.3% rather than 87.1%.

    Tab.3显示了我们的结果。由于之前的工作没有涵盖全部14个视角的实验,我们列出了两种图库序列的结果,即14个视角和4个典型视角(0◦,30◦60◦90◦)。所有结果均在全部视角中取平均值,并且不包括相同的视角。结果显示,我们的方法在如此大规模多角度的数据集上仍然具有很强的泛化能力。此外,由于每个样本的表达仅需要计算一次,因此用8块NVIDIA 1080TI GPU测试一次模型(包含133780个序列)只需要7分钟。值得注意的是,由于一些目标错过了几个步态序列并且我们没有从探针中移除它们,因此一次命中率的最大值不能达到100%。如果忽略掉上述有问题的样本,一次命中率会从87.1%提高到93.3%。

    4.3 AblationExperiments 消融实验

    Tab. 2 shows the thorough results of ablation experiments. The effectiveness of every innovation in Sec. 3 is studied.

    Tab.2展示了消融实验的全部结果。研究了Sec.3中每项创新的有效性。

    Set VS. GEI

    The first two lines of Tab. 2 show the effectiveness of regarding gait as a set. With fully identical networks, the result of using set exceeds that of using GEI by more than 10% on NM subset and more than 25% on CL subset. The only difference is that in GEI experiment, gait silhouettes are averaged into a single GEI before being fed into the network.

    Tab.2的前两行显示了将步态作为序列的有效性。对于完全相同的网络来讲,使用序列而不是GEI可以对NM子序列识别率提高10%,CL提高25%。前两行实验唯一不同的是GEI实验中,步态轮廓在送入网络之前按照均匀权值合并成一张GEI。

    There are mainly two reasons for this phenomenal improvement. 1) Our SP extracts the set-level feature based on high-level feature map where temporal information can be well preserved and spatial information has been sufficiently processed. 2) As mentioned in Sec. 4.2, regarding gait as a set enlarges the volume of training data.

    这种显著改善主要有两个原因。1)SP基于高级的特征图提取了序列级特征,可以很好的保留时间信息同时也充分运用空间信息。2)如Sec.4.2中提到的,将步态视作序列相当于增强训练数据。

    Impact of SP

    In Tab. 2, the results from the third line to the eighth line show the impact of different SP strategies. SP with attention, 1×1 convolution (1 1C) joint function and max(·) obtain the highest accuracy on the NM, BG, and CL subsets respectively. Considering SP with max(·) also achieved the second best performance on the NM and BG subset and has the most concise structure, we choose it as SP in the final version of GaitSet.

    Tab.2中,3-8行的结果显示不同SP策略的影响,分别是SP+注意力机制,1×1卷积连接函数和max(·),3种SP策略在各个子序列中都有获得最高正确率。但考虑到max(·)除了在CL子序列中获得最高正确率,还在NM和BG子序列中获得第二高的正确率,我们选择max(·)作为SP的最终策略。

    Impact of HPM and MGP

    The second and the third lines of Tab. 2 compare the impact of independent weight in HPM. It can be seen that using independent weight improves the accuracy by about 2% on each subset.In the experiments, we also find out that the introduction of independent weight helps the network converge faster.The last two lines of Tab.2 show that MGP can bring improvement on all three test subsets. This result is consistent the theory mentioned in Sec. 3.4 that set-level features extracted from different layers of the main pipeline contain different valuable information.

    Tab.2的第2、3行比较了HPM独立权重的影响。可以看出,独立权重可以使每种子序列的正确率提高2%。实验中,我们还发现引入独立权重可以帮助网络更快的聚合。Tab.2的最后两行显示除MGP可以同时提高所有子序列的正确率。这个结果的理论依据如Sec.3.4所述帧级特征是从主流程中不同层里提取到的,包含了不同的有价值信息。

    4.4 Practicality 实用性

    Due to the flexibility of set, GaitSet has great potential in more complicated practical conditions. In this section, we investigate the practicality of GaitSet through three novel scenarios. 1) How will it perform when the input set only contains a few silhouettes? 2) Can silhouettes with different views enhance the identification accuracy? 3) Whether can the model effectively extract discriminative representation from a set containing silhouettes shot under different walking conditions. It is worth noting that we did not retrain our model in these experiments. It is fully identical to that in Sec. 4.2 with setting LT. Note that, all the experiments containing random selection in this section are ran for 10 times and the average accuracies are reported.

    由于序列的灵活性,GaitSet仍有很大潜力可以挑战更复杂的实际情况。在这部分,我们通过3个新颖的场景来研究Gait Set的实用性。

    1)GaiSet能否在输入仅有几个轮廓时表现仍然良好?

    2)具有不同视角的轮廓是否可以提高识别准确度?

    3)模型是否可以有效的从一个包含不同行走状态轮廓的序列中提取表达特征?

    值得注意的是,我们没有在这些实验中重新训练我们的模型。模型时与Sec.4.2中的LT配置完全相同的。注:所有包含随机选取的实验都运行了10次并报告10次实验的平均精度。

    Limited Silhouettes 有限轮廓数量

    In real forensic identification scenarios, there are cases that we do not have a continuous sequence of a subject’s gait but only some fitful and sporadic silhouettes. We simulate such a circumstance by randomly selecting a certain number of frames from sequences to compose each sample in both gallery and probe. Fig. 5 shows the relationship between the number of silhouettes in each input set and the rank-1 accuracy averaged on all 11 probe views.

    在真实法医鉴定场景中,很多情况下我们无法获取目标的连续步态序列,只有一些断断续续零星的轮廓。我们通过随机选取连续序列中的一些帧来模拟上述场景。Fig.5中显示了每组输入序列轮廓的数量和11个视角的一次命中率之间的关系。

    图5.CASIA-B数据集使用LT训练,平均一次命中率受轮廓数量的约束。正确率是11个视角中除去相同视角的平均值。并且最终报告的结果是10次实验的平均值。

    Our method attains an 82% accuracy with only 7 silhouettes. The result also indicates that our model makes full use of the temporal information of gait. Since 1) the accuracy rises monotonically with the increase of the number of silhouettes. 2) The accuracy is close to the best performance when the samples contain more than 25 silhouettes. This number is consistent with the number of frames that one gait period contains.

    我们的方法在仅输入7个轮廓就可以得到82%的正确率。结果还表明我们的模型充分利用了步态的时间信息。因为:

    1)随着轮廓数量的增加,精度单调上升。

    2)当样本含量超过25个轮廓后,正确率接近最佳状态。这个数字与一个步态周期所包含的帧数一致。

    MultipleViews 多视角

    There are conditions that different views of one person’s gait can be gathered. We simulate these scenarios by constructing each sample with silhouettes selected from two sequences with the same walking condition but different views. To eliminate the effects of silhouette number, we also conduct an experiment in which the silhouette number is limited to 10. Specifically, in the contrast experiments of single view, an input set is composed by 10 silhouettes from one sequence. In the two-view experiment, an input set is composed by 5 silhouettes from each of two sequences. Note that in this experiment, only probe samples are composed by the way discussed above, whereas sample in the gallery is composed by all silhouettes from one sequence.

    有些情况下收集到的是一个人不同视角的步态信息。我们通过从具一个人的相同步态情况不同视角序列中抽取轮廓的方式模拟上述情况。

    为了消除轮廓数的影响,我们还进行了一个实验,其中轮廓数限制为10。具体而言,在单视角对比实验中,一个输入 序列由10个轮廓构成。在2个视角实验中,一个输入序列由每个序列抽取5个轮廓共计10个轮廓组成。值得注意的是,实验中只有探针是如上组成,图库中其他样本是由一个序列中的全部轮廓构成的。

    Tab. 4 shows the results. As there are too many view pairs to be shown, we summarize the results by averaging accuracies of each possible view difference. For example, the result of 90◦ difference is averaged by accuracies of 6 view pairs (0◦&90◦,18◦&108◦,...,90◦&180◦). Further, the 9 view differences are folded at 90◦ and those larger than 90◦ are averaged with the corresponding view differences less than 90◦. For example, the results of 18◦ view difference are averaged with those of 162◦ view difference.

    Tab.4显示了结果。由于需要展示的视角对太多了,我们将每个可能的相差视角结果取了平均值。例如:90°一列6个视角对(0◦&90◦,18◦&108◦,...,90◦&180◦)的平均值。另外,视角差共计有9中可能,大于90°和小于90°部分对称合并求取平均值了。例如:18°视角差和162°视角差的正确率合并在一起计算平均正确率。

    It can be seen that our model can aggregate information from different views and boost the performance. This can be explained by the pattern between views and accuracies that we have discussed in Sec. 4.2. Containing multiple views in the input set can let the model gather both parallel and vertical information, resulting in performance improvement.

    可以看出,我们的模型可以聚合来自不同视图的信息并提高性能。这可以通过我们在Sec.4.2中讨论的视图和准确度之间的模式来解释。包含多视角输入序列可以让模型聚集平行视角(90°)和垂直视角(0°&180°)信息,以获得更好的表现。

    Multiple Walking Conditions

    In real life, it is highly possible that gait sequences of the same person are under different walking conditions. We simulate such a condition by forming input set with silhouettes from two sequences with same view but different walking conditions. We conduct experiments with different silhouette number constraints. Note that in this experiment, only probe samples are composed by the way discussed above. Any sample in the gallery is constituted by all silhouettes from one sequence. What’s more, the probe-gallery division of this experiment is different. For each subject, sequences NM #02, BG #02 and CL #02 are kept in the gallery and sequences NM #01, BG #01 and CL #01 are used as probe.

    现实生活中,很可能同一个人有不同的行走状态。我们通过从同一个同样视角的两个不同行走状态的序列中抽取轮廓构成模拟上述情况的序列。我们用不同轮廓数量进行实验。注意:只有探针样本是如上方法构造的,其他样本还是用一个序列中的全部轮廓。另外,该实验的探针划分有些不同。对于每个目标而言,序列 NM #02, BG #02 和 CL #02还保持在图库中,但是NM #01, BG #01 和CL #01作为探针。

    Tab. 5 shows the results. First, the accuracies will still be boosted with the increase of silhouette number. Second,when the number of silhouettes are fixed, the results reveal relationships between different walking conditions. Silhouettes of BG and CL contain massive but different noises, which makes them complementary with each other. Thus, their combination can improve the accuracy. However, silhouettes of NM contain few noises, so substituting some of them with silhouettes of other two conditions cannot bring extra information but only noises and can decrease the accuracies.

    Tab.5显示了结果。首先,正确率还是回随着轮廓数量的增加而增长。第二,当轮廓数量固定时,揭示了不同行走条件之间的关系。轮廓BG和 CL包含了大量不同的轮廓且噪声不同,这使得他们互补。因此,他们的结合可以提升准确率。但是,NM轮廓包含很少的噪声,因此用其他两个条件的轮廓代替NM中的一些不能带来有用的信息,只能产生噪音,并且会降低精度。

    「5」结论

    In this paper, we presented a novel perspective that regards gait as a set and thus proposed a GaitSet approach. The GaitSet can extract both spatial and temporal information more effectively and efficiently than those existing methods regarding gait as a template or sequence. It also provide a novel way to aggregate valuable information from different sequences to enhance the recognition accuracy. Experiments on two benchmark gait datasets has indicated that compared with other state-of-the-art algorithms, GaitSet achieves the highest recognition accuracy, and reveals a wide range of flexibility on various complex environments, showing a great potential in practical applications. In the future, we will investigate a more effective instantiation for Set Pooling (SP) and further improve the performance in complex scenarios.

    在本文中,我们提出了一种新的视角,将步态视为一组序列,从而提出了一种GaitSet方法。

    GaitSet可以比那些将步态作为模板或序列的现有方法更有效地提取空间和时间信息。它还提供了一种从不同序列聚合有价值信息的新方法,以提高识别准确性。

    两个基准步态数据集(公开标准数据集)的实验表明,与其他最先进的算法相比,GaitSet实现了最高的识别精度,并在各种复杂环境中显示出广泛的灵活性,在实际应用中显示出巨大的潜力。

    后续,我们将研究更有效的Set Pooling(SP)实例化,并进一步提高复杂场景的性能。

    展开全文
  • 1. 头文件交叉编译C++参考地址...会报错未能识别classa或者classb。 解决方法是:使用externclassb 提前申明b类 如果B中有一个USTRUCT的结构体定义,而A B头文件交叉包含了,...
  • 每天一个IDA小技巧(六)交叉引用

    千次阅读 2020-12-28 08:08:00
    在使用IDA进行逆向时,经常会碰到需要「定位某个变量被哪些函数访问」或者「某个函数是从什么地方被调用的」。这种跟踪变量或函数的功能在IDA中被称作交叉引用(XREF),同时IDA还提供了...
  • 最近写report 遇到一个问题,就是在图片或表格的交叉引用时, 便已没有问题,正文中相应位置显示 ?? 或者显示, 查了相关资料之后发现问题出在\label{}和\caption{} 的前后顺序上。 问题如下: \documentclass{...
  • 注:感觉Cajviewer的OCR功能要比Adobe acrobat自带的OCR识别率要高得多,有的文件就常用Cajviewer打开。
  • 一开始在这里耽误很久,.dylib 库编译没问题,但是 Xcode 项目引用后一直报 ‘library not load’、’symbols not found for architexture’ 之类的错误.. C++ 头文件去掉动态链接代码 C++ 动态库接口对应的...
  • 静态手势识别

    千次阅读 2020-04-29 14:30:06
    现有产品和技术 腾讯云手势识别 功能 静态手势识别、关键点识别、指尖识别、手势动作识别等多种功能 静态手势识别 返回手的位置及类别 17种单手,8种双手 ...
  • OCR识别

    千次阅读 2018-11-11 10:31:49
    文字识别是计算机视觉研究领域的分支之一,归属于模式识别和人工智能,是计算机科学的重要组成部分。 本文将以上图为主要线索,简要阐述在文字识别领域中的各个组成部分。 一 ,文字识别简介 计算机文字识别,...
  • 在我们进行微服务架构设计和改造过程中,一个可避免的问题是如何确定服务边界、如何进行服务识别,微服务的划分粒度究竟如何确认。我们可能会听到,服务既能太大,也能太小,当然这是一个笼统的概念。那么,...
  • linux 交叉编译找到库文件

    千次阅读 2014-07-30 10:02:43
    网上大众的作法这里做介绍 我遇到
  • 摘要:全面解析人脸识别技术原理、领域人才情况、技术应用领域和发展趋势。   自20世纪下半叶,计算机视觉技术逐渐地发展壮大。同时,伴随着数字图像相关的软硬件技术在人们生活中的广泛使用,数字图像已经成为...
  • 人脸识别,是基于人的脸部特征信息进行身份识别的一种生物识别技术。通常采用摄像机或摄像头采集含有人脸的图像或视频流,并自动在图像中检测和跟踪人脸。 自20世纪下半叶,计算机视觉技术逐渐地发展壮大。同时,...
  • 文字识别(一)--传统方案综述

    万次阅读 多人点赞 2019-02-17 12:48:15
    文字识别是计算机视觉研究领域的分支之一,归属于模式识别和人工智能,是计算机科学的重要组成部分,本文将以上图为主要线索,简要阐述在文字识别领域中的各个组成部分(更侧重传统非深度学习端到端方案)。...
  • 上篇介绍了交叉编译的基本原理,以及交叉编译一个简单的HelloWorld程序,这篇将介绍如何交叉编译Qt程序。 由于Qt程序依赖的底层以及第三方的相关库太多(比如libGL.so库为OpenGL库,libX11.so库为系统图像库等等),...
  • 该程序通过应用人脸识别技术,与各大社交媒体上发布的照片进行反复核对,从而能够识别出网络淫秽作品中出现的女性。 此言论一经发布,便在微博上引起了轩然大波,评论区的网友们也欢呼雀跃起来。不久后,耶鲁大学...
  • 交叉编译SQLite

    千次阅读 2015-01-13 18:59:25
    由于是交叉编译,所以一些相关的依赖库,比如libreadline和libncurses就能直接安装到系统里面了(安装了也没用不会被交叉编译器识别),而是应该安装到交叉编译器的目录中。 先分别去下面这两个地方下载...
  • 玩转Android之二维码生成与识别

    万次阅读 多人点赞 2016-05-26 15:29:22
    二维码,我们也称作QRCode,QR表示quick response即...二维码的使用我主要想分为两部分来给大家介绍,一部分就是二维码的生成,这里的知识点都很简单,还有一部分是二维码的识别,这里稍微麻烦一些,不过细心来做其实
  • 接着介绍搭建图文识别模型过程中经常被引用到的多种特征提取基础网络、物体检测网络框架,以及它们被应用到图文识别任务中所面临的场景适配问题。然后介绍最近三年来出现的各种文本边框检测模型、文字内容识别模型、...
  • 文字识别总结(OCR)

    万次阅读 多人点赞 2018-11-15 10:26:31
    最近在读Tesseract相关文章,发现一篇总结的不错的文章,对刚入门或者准备入门的具有挺大参考价值,转自:http://www.xtecher.com/Xfeature/view?aid=5372 链接可能挂掉了,今天重新补充一下...二 ,印刷体文字识别...
  • 摘要:在多年研究之后,他为何放弃了语音识别,转做人脸识别?从学术圈到成立公司,他有哪些思考?如何更好地识别人脸? 继“让机器听懂你的声音”,钛坦白又请来六位钛客,探讨如何让机器看懂这个世界。本文...
  • IOT语义交互性之交叉

    千次阅读 2018-02-22 00:00:00
    讨论业务和设备本体论的交叉点, 以及两者的元素如何能够提高可伸缩性。 “Good design is good business.” – Thomas Watson Jr. 发展成为以IOT为中心的业务 对于用户来说, 物联网的真正价值并在于远程控制智能...
  • 验证码识别

    千次阅读 2013-07-04 18:12:44
    比如按照从左至右,尝试增加component来识别,如果识别而且 component的总宽度,总面积还比较小,继续增加。 当然排除拒识的可能性。 ) 3)字符部件组合和识别。 part2的代码展示了切割后的字母组合,和...
  • 构建ARM Linux交叉编译工具链 详解

    千次阅读 2017-10-27 20:10:12
     ... ● 了解交叉编译工具链  ● 理解分步构建交叉编译工具链的方法  ● 学会使用Crosstool工具构建交叉编译工具链 2.1 交叉编译工具链介绍 读者可能会有疑问,为什么要用交叉编译器?交叉
  • 【CS231n】斯坦福大学李飞飞视觉识别课程笔记 由官方授权的CS231n课程笔记翻译知乎专栏——智能单元,比较详细地翻译了课程笔记,我这里就是参考和总结。 【CS231n】斯坦福大学李飞飞视觉识别课程笔记(十七):...
  • 以我们学习“机器学习”的经验来看,很多高大上的概念刚开始懂也没关系,先写个东西来跑跑,有个感觉了之后再学习那些概念和理论就快多了。如果别人已经做好了轮子,直接拿过来用则更快。因此,本文直接用Michael ...
  • 人脸检测与识别

    万次阅读 2019-07-17 10:58:01
    人脸检测( Face Detection )和人脸识别技术是深度学习的重要应用之一。本章首先会介绍MTCNN算法的原理, 它是基于卷积神经网络的一种高精度的实时人脸检测和对齐技术。接着,还会介绍如何利用深度卷积网络提取人脸...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 21,445
精华内容 8,578
关键字:

交叉引用不识别