精华内容
下载资源
问答
  • 可用于评估社区划分效果的标准互信息NMI的Python代码。输入为算法的社区划分结果与真实划分结果,均为二维列表。
  • 连续值信息熵MATLAB代码NMI Octave 和 R 函数用于计算二进制信号的归一化多信息(互信息的多元版本)。 版权所有 2016 Kenneth Ball 根据 Apache 许可,2.0 版(“许可”)获得许可; 除非遵守许可,否则您不得使用...
  • 代码包含三个聚类分析常用的外部评价指标:调整兰德指数(ARI),标准化互信息(NMI),准确度(AC)。
  • nmi指数matlab代码自述文件 此 repo 包含论文中的算法和实验代码 图聚类的学习分辨率参数 Nate Veldt、David Gleich、Anthony Wirth 第 29 届万维网国际会议论文集。 arXiv 预印本: 大部分代码使用 Julia 编程语言...
  • nmi指数matlab代码存储库名称:“mri-prepro-nipype” 这是一个预处理 MRI 图像的存储库,用于不同类型的算法中,用于检测中风、病变和威利斯环 (CoW) 类型。 (在最后一步诊断完成之前,此分支不会合并到 master !...
  • Kinetis L系列将NMI和Reset管脚复用成GPIO需要注意的问题。jicheng0622的原创。
  • nmi指数matlab代码通过最优传输的神经主题模型 这个 repo 包含 ICLR 论文 [1] 的 Tensorflow 实现。 要求 代码在 Tensorflow 1.0 上运行,应该很容易适应 Tensorflow 2.0。 需求在requirements.txt文件中。 请使用 ...
  • nmi指数matlab代码有良心的定向协同聚类 (DCC) 该存储库中的代码实现了论文中提出的 DCC 协同聚类算法: 阿吉莱斯·萨拉赫、穆罕默德·纳迪夫SIAM 数据挖掘国际会议。 2017年 使用示例 以下代码是我们如何将 DCC ...
  • 复杂网路中,LFR复杂网络生成图代码,以及NMI代码,内附生成图与详解,仅供个人使用
  • NMI评价方法

    2014-01-09 15:56:47
    NMI 评价方法(matlab),用于评价分类和聚类的FUNCTION,输入是正确的类标和试验后的类标
  • 计算重叠互信息NMI

    2014-07-19 23:40:12
    计算重叠互信息NMI源码 linux环境 参考文献:Lancichinetti A Fortunato S Kertész J Detecting the overlapping and hierarchical community structure in complex networks[J] New Journal of Physics 2009 11 3 ...
  • NMI计算

    千次阅读 2020-04-23 13:35:17
    NMI(Normalized Mutual Information), 标准化互信息。常用于聚类,度量 聚类结果 与 数据集真实情况 的相似度。 NMI的值∈[0, 1]。值越大,说明聚类结果与数据集真实情况的相似度越大,聚类结果越好。如果算法结果...

    介绍:

    • NMI(Normalized Mutual Information), 标准化互信息。常用于聚类,度量 聚类结果数据集真实情况 的相似度。
    • NMI的值∈[0, 1]。值越大,说明聚类结果与数据集真实情况的相似度越大,聚类结果越好。如果算法结果很差则NMI值接近0。

    举例:假设对于17个样本点 ( v 1 , v 2 , . . . , v 17 ) (v1,v2,...,v17) (

    展开全文
  • 对于NMI计算的python实现
  • 8086汇编NMI.rar

    2019-11-24 14:51:43
    8086汇编语言中断系统 可应对各种8086系统的大作业课设 使用汇编语言编写 微机原理与接口技术大作业
  • 计算nmi的matlab代码

    2013-11-05 19:04:28
    matlab计算聚类的代码,是一种可以计算聚类正确率的程序
  • 严格意义来讲nmi_watchdog ,属于中断检测范畴,是基于非屏蔽中断NMI的检测机制,是一种内核状态监护的狗,关于其介绍可参考nmi_watchdog.txt 1 2 [NMI watchdog is availableforx86 and x86-64 ...
    • 由 b178903294创建, 最后修改于9月 23, 2019

     

    严格意义来讲nmi_watchdog  ,属于中断检测范畴,是基于非屏蔽中断NMI的检测机制,是一种内核状态监护的狗,关于其介绍可参考nmi_watchdog.txt

    1

    2   [NMI watchdog is available for x86 and x86-64 architectures] 

    3    

    4   Is your system locking up unpredictably? No keyboard activity, just 

    5   a frustrating complete hard lockup? Do you want to help us debugging 

    6   such lockups? If all yes then this document is definitely for you. 

    7    

    8   On many x86/x86-64 type hardware there is a feature that enables 

    9   us to generate 'watchdog NMI interrupts'.  (NMI: Non Maskable Interrupt 

    10  which get executed even if the system is otherwise locked up hard). 

    11  This can be used to debug hard kernel lockups.  By executing periodic 

    12  NMI interrupts, the kernel can monitor whether any CPU has locked up, 

    13  and print out debugging messages if so.

    14 

    15  In order to use the NMI watchdog, you need to have APIC support in your

    16  kernel. For SMP kernels, APIC support gets compiled in automatically. For

    17  UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local

    18  APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and

    19  features -> IO-APIC support on uniprocessors) in your kernel config.

    20  CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC.

    21  CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain

    22  kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,

    23  may implicitly disable the NMI watchdog.]

    24 

    25  For x86-64, the needed APIC is always compiled in.

    26 

    27  Using local APIC (nmi_watchdog=2) needs the first performance register, so

    28  you can't use it for other purposes (such as high precision performance

    29  profiling.) However, at least oprofile and the perfctr driver disable the

    30  local APIC NMI watchdog automatically.

    31 

    32  To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot

    33  parameter.  Eg. the relevant lilo.conf entry:

    34 

    35          append="nmi_watchdog=1"

    36 

    37  For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1.

    38  For UP machines without an IO-APIC use nmi_watchdog=2, this only works

    39  for some processor types.  If in doubt, boot with nmi_watchdog=1 and

    40  check the NMI count in /proc/interruptsif the count is zero then

    41  reboot with nmi_watchdog=2 and check the NMI count.  If it is still

    42  zero then log a problem, you probably have a processor that needs to be

    43  added to the nmi code.

    44 

    45  A 'lockup' is the following scenario: if any CPU in the system does not

    46  execute the period local timer interrupt for more than 5 seconds, then

    47  the NMI handler generates an oops and kills the process. This

    48  'controlled crash' (and the resulting kernel messages) can be used to

    49  debug the lockup. Thus whenever the lockup happens, wait 5 seconds and

    50  the oops will show up automatically. If the kernel produces no messages

    51  then the system has crashed so hard (eg. hardware-wise) that either it

    52  cannot even accept NMI interrupts, or the crash has made the kernel

    53  unable to print messages.

    54 

    55  Be aware that when using local APIC, the frequency of NMI interrupts

    56  it generates, depends on the system load. The local APIC NMI watchdog,

    57  lacking a better source, uses the "cycles unhalted" event. As you may

    58  guess it doesn't tick when the CPU is in the halted state (which happens

    59  when the system is idle), but if your system locks up on anything but the

    60  "hlt" processor instruction, the watchdog will trigger very soon as the

    61  "cycles unhalted" event will happen every clock tick. If it locks up on

    62  "hlt"then you are out of luck -- the event will not happen at all and the

    63  watchdog won't trigger. This is a shortcoming of the local APIC watchdog

    64  -- unfortunately there is no "clock ticks" event that would work all the

    65  time. The I/O APIC watchdog is driven externally and has no such shortcoming.

    66  But its NMI frequency is much higher, resulting in more significant hit

    67  to the overall system performance.

    68 

    69  On x86 nmi_watchdog is disabled by default so you have to enable it with

    70  a boot time parameter.

    71 

    72  It's possible to disable the NMI watchdog in run-time by writing "0" to

    73  /proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable

    74  the NMI watchdog. Notice that you still need to use "nmi_watchdog=" parameter

    75  at boot time.

    76 

    77  NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally

    78  on x86 SMP boxes.

     

     

    简而言之就是NMI要基于APIC,还要考虑硬件平台的中断架构,由于我们seewobook SN21采用的是intel N3350所以中断架构就是SMP,所以我们的nmi_watchdog=1,也就是I/O APIC watchdog。其缺点很明显:每个时钟周期都要触发NMI,这对性能是个不小的影响,引用上面的原话           "...its NMI frequency is much higher, resulting in a more significant hit to the overall system performance",cpu性能大概消耗近1%,这对于一个监护程序来说很夸张了。而且可靠性不太好。

    关于死锁检测原理可参照:https://blog.csdn.net/zhouhuacai/article/details/78046077

    1.先来看看softlockup相关的原理和测试:

    SoftLockup 检测首先需要对每一个CPU core注册叫做watchdog的kernel线程。即[watchdog/0],[watchdog/1],[watchdog/2]…

    同时,系统会有一个高精度的计时器hrtimer(一般来源于APIC),该计时器能定期产生时钟中断,该中断对应的中断处理例程是kernel/watchdog.c: watchdog_timer_fn(),在该例程中:
    - 要递增计数器hrtimer_interrupts,这个计数器同时为hard lockup detector用于判断CPU是否响应中断;
    - 还要唤醒[watchdog/x]内核线程,该线程的任务是更新一个时间戳;
    - soft lock detector检查时间戳,如果超过soft lockup threshold一直未更新,说明[watchdog/x]未得到运行机会,意味着CPU被霸占,也就是发生了soft lockup。

    注意,这里面的内核线程[watchdog/x]的目的是更新时间戳,该时间戳是被watch的对象。而真正的看门狗,则是由时钟中断触发的 watchdog_timer_fn(),这里面 [watchdog/x]是被scheduler调用执行的,而watchdog_timer_fn()则是被中断触发的。
     

     

    下面贴出nmi_watchdog 检测softlockup触发panic的主逻辑代码:

    kernel/watchdog.c

    static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)

      {

              unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);

              struct pt_regs *regs = get_irq_regs();

              int duration;

              int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;

     

              /* kick the hardlockup detector */

              watchdog_interrupt_count();

     

              /* test for hardlockups on the next cpu */

              watchdog_check_hardlockup_other_cpu();

     

              /* kick the softlockup detector */

              wake_up_process(__this_cpu_read(softlockup_watchdog));

     

              /* .. and repeat */

              hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));

    +         printk(KERN_INFO

    +                       "now we test nmi_watchdog,touch_ts=%ld\n",touch_ts);

              if (touch_ts == 0) {     

                               if (unlikely(__this_cpu_read(softlockup_touch_sync))) {

                              /*  

                               * If the time stamp was touched atomically

                               * make sure the scheduler tick is up to date.

                               */

                              __this_cpu_write(softlockup_touch_sync, false);

                              sched_clock_tick();         

                              }

     

                      /* Clear the guest paused flag on watchdog reset */

                      kvm_check_and_clear_guest_paused();

                      __touch_watchdog();

                      return HRTIMER_RESTART;

              }

     

              /* check for a softlockup

               * This is done by making sure a high priority task is

               * being scheduled.  The task touches the watchdog to

               * indicate it is getting cpu time.  If it hasn't then

               * this is a good indication some task is hogging the cpu

               */

    +       printk(KERN_INFO

    +                      "that maybe occur softlockup,touch_ts=%ld  is_softlockup=%d\n",touch_ts,is_softlockup(touch_ts));

     

            duration = is_softlockup(touch_ts);

            if (unlikely(duration)) {

                    /*

                     * If a virtual machine is stopped by the host it can look to

                     * the watchdog like a soft lockup, check to see if the host

                     * stopped the vm before we issue the warning

                     */

                    if (kvm_check_and_clear_guest_paused())

                            return HRTIMER_RESTART;

     

                    /* only warn once */

                    if (__this_cpu_read(soft_watchdog_warn) == true) {

                            /*

                             * When multiple processes are causing softlockups the

                             * softlockup detector only warns on the first one

                             * because the code relies on a full quiet cycle to

                             * re-arm.  The second process prevents the quiet cycle

                             * and never gets reported.  Use task pointers to detect

                             * this.

                             */

                            if (__this_cpu_read(softlockup_task_ptr_saved) !=

                                current) {

                                    __this_cpu_write(soft_watchdog_warn, false);

                                    __touch_watchdog();

                            }

                            return HRTIMER_RESTART;

                    }

     

                    if (softlockup_all_cpu_backtrace) {

                            /* Prevent multiple soft-lockup reports if one cpu is already

                             * engaged in dumping cpu back traces

                             */

                            if (test_and_set_bit(0, &soft_lockup_nmi_warn)) {

                                    /* Someone else will report us. Let's give up */

                                    __this_cpu_write(soft_watchdog_warn, true);

                                    return HRTIMER_RESTART;

                            }

                    }

                    pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",

                            smp_processor_id(), duration,

                            current->comm, task_pid_nr(current));

                    __this_cpu_write(softlockup_task_ptr_saved, current);

                    print_modules();

                    print_irqtrace_events(current);

                    if (regs)

                            show_regs(regs);

                    else

                            dump_stack();

     

                    if (softlockup_all_cpu_backtrace) {

                            /* Avoid generating two back traces for current

                             * given that one is already made above

                             */

                            trigger_allbutself_cpu_backtrace();

     

                            clear_bit(0, &soft_lockup_nmi_warn);

                            /* Barrier to sync with other cpus */

                            smp_mb__after_atomic();

                    }

     

                    add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);

                    if (softlockup_panic)

                            panic("softlockup: hung tasks");

                    __this_cpu_write(soft_watchdog_warn, true);

            else

                    __this_cpu_write(soft_watchdog_warn, false);

     

            return HRTIMER_RESTART;

    }

    上述代码通过if (touch_ts == 0)  判断是否返回退出,没发生softlockup的时候,touch_ts全都为负或0,所以其int值均为0,总是返回退出,当touch_ts如同log里面大于0时,就要跳过if语句开始执行后面的代码了。通过这两句duration = is_softlockup(touch_ts); if (unlikely(duration))   来判断softlockup超时时间,大于零就开始执行panic流程。打印一条pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current));接着panic打印一条panic("softlockup: hung tasks");   这与下面我们测试看到的log信息一致,如果panic没有死的话我们的系统就会重启了,并保存相关log到/var/spool/crash/下,开机第一件事情就是把这个目录下的log全部保存,因为过一会儿系统会删除掉,这些log对于定位死锁内核程序很有帮助。

    我们在kernel/kernel目录下编译个内核模块,启动系统后装载进内核来测试nmi_watchdog的功能。代码如下:

    kernel/softlockup.c

    #include<linux/kernel.h>

    #include<linux/module.h>

    #include<linux/kthread.h>

    #include <linux/spinlock.h>

    struct task_struct *task0;

    static spinlock_t spinlock;

    int val;

     

    int task(void *arg)

    {

        printk(KERN_INFO "%s:%d\n",__func__,__LINE__);

        /* To generate panic uncomment following */

        /* panic("softlockup: hung tasks"); */

     

        while(!kthread_should_stop()) {

            printk(KERN_INFO "%s:%d\n",__func__,__LINE__);

            spin_lock(&spinlock);

            /* busy loop in critical section */

            while(1) {

                printk(KERN_INFO "%s:%d\n",__func__,__LINE__);

            }

     

            spin_unlock(&spinlock);

        }

     

        return val;

    }

     

    static int softlockup_init(void)

    {

        printk(KERN_INFO "%s:%d\n",__func__,__LINE__);

     

        val = 1;

        spin_lock_init(&spinlock);

        task0 = kthread_run(&task,(void *)(long)val,"softlockup_thread");

        set_cpus_allowed_ptr(task0, cpumask_of(0));

     

        return 0;

    }

     

    static void softlockup_exit(void)

    {

        printk(KERN_INFO "%s:%d\n",__func__,__LINE__);

        kthread_stop(task0);

    }

    MODULE_LICENSE("GPL");

    module_init(softlockup_init);

    module_exit(softlockup_exit);

    上述代码是从网上找到的,它通过spinlock()实现关抢占,使得该CPU上的[watchdog/x]无法被调度。另外,通过set_cpus_allowed_ptr()将该线程绑定到特定的CPU上去。

    别忘了在kernel/Makefile文件中加入一行:obj-m +=softlockup.o          我们不直接装载到内核中,因为这样系统会有几率从开机开始一直重复触发softlockup,不是卡死就是循环重启,我们没办法看log。

    编译完成后我们直接把新编译的kernel部署到seewobook上。开机后insmod  /lib/module/4.4.159/kernel/kernel/softlockup.ko   就开始测试了。

    经过测试,发现进行softlockup后,nmi_watchdog有几率会重启,说明确实稳定行不行。nmi_watchdog成功检测softlockup并触发panic相关log如下:

     展开源码

    2.hardlockup原理和测试

    nmi_watchdog最主要的功能是检测hardlockup,这个才是他的主业。检测hard lockup的原理利用了PMU的NMI perf event,因为NMI中断是不可屏蔽的,在CPU不再响应中断的情况下仍然可以得到执行,它再去检查时钟中断的计数器hrtimer_interrupts是否在保持递增,如果停滞就意味着时钟中断未得到响应,也就是发生了hard lockup。
     

     

    代码实现原理就是读取现在的hrtimer_interrupts和之前存储的数值进行比较,如果两个值一样,那么说明中断被阻塞了即发生了hardlockup,那么就要开始触发panic了。代码如下:

    #ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI

      /* watchdog detector functions */

      static bool is_hardlockup(void)

      {

              unsigned long hrint = __this_cpu_read(hrtimer_interrupts);

    ~         printk(KERN_INFO

    +                         "now we test nmi_watchdog,hrtimer_intettupts_saved=%ld hrint=%ld\n",__this_cpu_read(hrtimer_interrupts_saved),hrint);

              if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)

                      return true;

     

              __this_cpu_write(hrtimer_interrupts_saved, hrint);

              return false;

      }

      #endif

    控制触发panic逻辑代码如下:

    static void watchdog_overflow_callback(struct perf_event *event,

                       struct perf_sample_data *data,

                       struct pt_regs *regs)

      {

              /* Ensure the watchdog never gets throttled */

              event->hw.interrupts = 0;

     

              if (__this_cpu_read(watchdog_nmi_touch) == true) {

                      __this_cpu_write(watchdog_nmi_touch, false);

                      return;

              }

     

              /* check for a hardlockup

               * This is done by making sure our timer interrupt

               * is incrementing.  The timer interrupt should have

               * fired multiple times before we overflow'd.  If it hasn't

               * then this is a good indication the cpu is stuck

               */

              if (is_hardlockup()) {                                                          //通过这里进入panic流程

                      int this_cpu = smp_processor_id();

     

                      /* only print hardlockups once */

                      if (__this_cpu_read(hard_watchdog_warn) == true)

                              return;

     

                      pr_emerg("Watchdog detected hard LOCKUP on cpu %d", this_cpu);            //输出log

                      print_modules();

                      print_irqtrace_events(current);

                      if (regs)

                              show_regs(regs);

                      else

                              dump_stack();

     

                      /*

                       * Perform all-CPU dump only once to avoid multiple hardlockups

                       * generating interleaving traces

                       */

                      if (sysctl_hardlockup_all_cpu_backtrace &&

                                      !test_and_set_bit(0, &hardlockup_allcpu_dumped))

                              trigger_allbutself_cpu_backtrace();

     

                      if (hardlockup_panic)

                              panic("Hard LOCKUP");                                           //触发panic

     

                      __this_cpu_write(hard_watchdog_warn, true);

                      return;

              }

     

              __this_cpu_write(hard_watchdog_warn, false);

              return;

      }

     

    hardlockup测试代码:

    hardlockup.c

    #include <linux/init.h>

    #include <linux/module.h>

    #include <linux/kernel.h>

    #include <linux/kthread.h>

    #include <linux/spinlock.h>

     

    MODULE_LICENSE("GPL");

     

    static int

    hog_thread(void *data)

    {

        static DEFINE_SPINLOCK(lock);

        unsigned long flags;

     

        printk(KERN_INFO "Hogging a CPU now\n");

        spin_lock_irqsave(&lock, flags);

        while (1);

     

        /* unreached */

        return 0;

    }

     

    static int __init

    hog_init(void)

    {

        kthread_run(&hog_thread, NULL, "hog");

        return 0;

    }

    MODULE_LICENSE("GPL");

    module_init(hog_init);

    在上述实例中,中断被关闭,普通中断无法被相应(包括时钟中断),线程无法被调度,因此,在这种情况下,不仅仅[watchdog/x]线程也无法工作,hrtimer也无法被相应。编译和安装操作参考上面softlockup测试。

     

    触发hardlockup重启后的crash log如下:

     折叠源码

    <6>[   59.148291] now we test nmi_watchdog,touch_ts=0

    <6>[   60.624320] now we test nmi_watchdog,hrtimer_intettupts_saved=25 hrint=25                //hrint不再更新说明已经发生了hardlockup

    <0>[   60.624324] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0                         // 开始panic流程    pr_emerg 输出相应log

    <4>[   60.624327] Modules linked in: hardlockup cmac rfcomm uinput snd_soc_sst_bxt_da7219_max98357a snd_soc_hdac_hdmi snd_soc_skl_ssp_clk snd_soc_dmic snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_sst_match snd_hda_ext_core snd_hda_core acpi_als snd_soc_max98357a snd_soc_da7219 bridge stp llc zram ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_mark v4l2loopback fuse ip6table_filter iio_trig_sysfs cros_ec_sensors cros_ec_sensors_ring cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf industrialio iwlmvm iwlwifi iwl7000_mac80211 cfg80211 btusb btrtl btbcm btintel uvcvideo videobuf2_vmalloc bluetooth videobuf2_memops videobuf2_v4l2 videobuf2_core joydev

    <4>[   60.624402] CPU: 0 PID: 2363 Comm: hog Not tainted 4.4.159 #163

    <4>[   60.624405] Hardware name: Google Coral/Coral, BIOS SN21.10176.68.2019_08_06_0950 03/02/2018

    <4>[   60.624408] task: ffff88016506d580 task.stack: ffff88006a67c000

    <4>[   60.624410] RIP: 0010:[<ffffffffc065a023>]  [<ffffffffc065a023>] hog_thread+0x23/0x1000 [hardlockup]

    <4>[   60.624418] RSP: 0018:ffff88006a67fe98  EFLAGS: 00000082

    <4>[   60.624420] RAX: 0000000000000282 RBX: ffff880175b7b840 RCX: 0000000000000001

    <4>[   60.624422] RDX: ffff88017fc11a60 RSI: ffff88017fc0f1c0 RDI: ffffffffc065c000

    <4>[   60.624425] RBP: ffff88006a67fe98 R08: 0000000000000021 R09: 0000000000000000

    <4>[   60.624427] R10: 00000000066c0a10 R11: ffffffff93c86d7c R12: ffff88006a67fef0

    <4>[   60.624429] R13: ffff88016506d580 R14: ffffffffc065a000 R15: 0000000000000000

    <4>[   60.624432] FS:  0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000

    <4>[   60.624434] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

    <4>[   60.624436] CR2: 000078acb25cb450 CR3: 000000006a43c000 CR4: 0000000000340670

    <4>[   60.624438] Stack:

    <4>[   60.624441]  ffff88006a67ff48 ffffffff93acf3a7 0000000000000000 ffffffff941fd570

    <4>[   60.624447]  0000000000000000 ffffffff00000000 dead4ead00000000 00000000ffffffff

    <4>[   60.624453]  ffffffffffffffff ffff88006a67fee0 ffff88006a67fee0 88fa469600000000

    <4>[   60.624459] Call Trace:

    <4>[   60.624469]  [<ffffffff93acf3a7>] kthread+0xb9/0xc9

    <4>[   60.624474]  [<ffffffff941fd570>] ? __switch_to_asm+0x40/0x70

    <4>[   60.624478]  [<ffffffff93acf2ee>] ? rcu_read_unlock_sched_notrace+0x48/0x48

    <4>[   60.624481]  [<ffffffff941fd5ee>] ret_from_fork+0x4e/0x80

    <4>[   60.624485]  [<ffffffff93acf2ee>] ? rcu_read_unlock_sched_notrace+0x48/0x48

    <4>[   60.624487] Code: <eb> fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

    <0>[   60.624522] Kernel panic - not syncing: Hard LOCKUP                                    //触发的panic

    <4>[   60.624525] CPU: 0 PID: 2363 Comm: hog Not tainted 4.4.159 #163

    <4>[   60.624527] Hardware name: Google Coral/Coral, BIOS SN21.10176.68.2019_08_06_0950 03/02/2018

    <4>[   60.624529]  ffff88017fc09ef8 88fa469670f29fd2 ffff88017fc09a78 ffffffff93d2ade8

    <4>[   60.624535]  ffff88017fc09ab0 ffffffff93d2ad9a 0000000000000000 0000003000000010

    <4>[   60.624541]  0000000000000086 88fa469670f29fd2 ffffffff945e2d99 ffff88017fc09b38

    <4>[   60.624547] Call Trace:

    <4>[   60.624549]  <NMI>  [<ffffffff93d2ade8>] __dump_stack+0x19/0x1b

    <4>[   60.624557]  [<ffffffff93d2ad9a>] dump_stack+0x4f/0x84

    <4>[   60.624562]  [<ffffffff93ab6491>] panic+0xd1/0x221

    <4>[   60.624567]  [<ffffffff93b2aedd>] watchdog_overflow_callback+0xf7/0x102

    <4>[   60.624571]  [<ffffffff93b52426>] __perf_event_overflow+0x140/0x1ad

    <4>[   60.624574]  [<ffffffff93b522e4>] perf_event_overflow+0x18/0x1a

    <4>[   60.624579]  [<ffffffff93a65665>] intel_pmu_handle_irq+0x235/0x440

    <4>[   60.624586]  [<ffffffff93a5f39b>] perf_event_nmi_handler+0x2c/0x49

    <4>[   60.624589]  [<ffffffff93a4f554>] nmi_handle+0x72/0x160

    <4>[   60.624593]  [<ffffffffc065a001>] ? hog_thread+0x1/0x1000 [hardlockup]

    <4>[   60.624596]  [<ffffffff93a4f16f>] do_nmi+0x8b/0x36f

    <4>[   60.624599]  [<ffffffffc065a000>] ? 0xffffffffc065a000

    <4>[   60.624603]  [<ffffffff941ff4dd>] end_repeat_nmi+0x87/0x8f

    <4>[   60.624606]  [<ffffffffc065a000>] ? 0xffffffffc065a000

    <4>[   60.624611]  [<ffffffff93c86d7c>] ? ramoops_pstore_read+0x634/0x634

    <4>[   60.624615]  [<ffffffffc065a023>] ? hog_thread+0x23/0x1000 [hardlockup]

    <4>[   60.624618]  [<ffffffffc065a023>] ? hog_thread+0x23/0x1000 [hardlockup]

    <4>[   60.624622]  [<ffffffffc065a023>] ? hog_thread+0x23/0x1000 [hardlockup]

    <4>[   60.624624]  <<EOE>>  [<ffffffff93acf3a7>] kthread+0xb9/0xc9

    <4>[   60.624629]  [<ffffffff941fd570>] ? __switch_to_asm+0x40/0x70

    <4>[   60.624633]  [<ffffffff93acf2ee>] ? rcu_read_unlock_sched_notrace+0x48/0x48

    <4>[   60.624637]  [<ffffffff941fd5ee>] ret_from_fork+0x4e/0x80

    <4>[   60.624640]  [<ffffffff93acf2ee>] ? rcu_read_unlock_sched_notrace+0x48/0x48

    <0>[   60.624720] Kernel Offset: 0x12a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

    <0>[   60.626407] gsmi: Log Shutdown Reason 0x02

     

    至此,nmi_watchdog相关的softlockup和hardlockup均已测试完毕,可见功能是正常的,只是稳定性可能差了一点。nmi_watchdog主要用于内核代码的检测和重置,主要在内核大神们开发阶段使用,一般是不建议在用户机中使用的,因为其不大稳定并且性能开销有点大,所以表现不大好。对于我们更常见的用户态hang的检测一般是交给softdog和iTCO_wdt的,softdog是完全基于软件的狗,所以也可能会随着系统崩溃而死掉,iTCO_wdt是基于硬件的 ,可靠性显然比前两个都高,但是由于SN21硬件配置错误导致iTCO_wdt已废,所以咱们目前只有指望这两个表现不大稳定的狗了,当这两个狗不起作用的时候还是老老实实F3+power手动重启吧。

    展开全文
  • 最近这段日子忙的不可开交,虽然一直在积累和酝酿要继续把技术经验分享出来,苦于一直没有得闲。这个周末终于可以安心下来,静静的写点东西出来,此刻坐在公司的工位上,旁边放着茶水,看着窗外的斜阳,咳咳,一下子...
  • NMI_watchdog 配置说明

    2012-11-23 08:52:22
    NMI看门狗触发kdump,记录死机日志,分析死机的原因。
  • 熵、交叉熵、峰值信噪比、Qabf、平均梯度、SSIM、互信息、NMI
  • Linux内核深入理解中断和异常(4):不可屏蔽中断NMI、浮点异常和SIMD rtoax 2021年3月 本文介绍一下几种trap: //* External hardware asserts (外部设备断言)the non-maskable interrupt [pin] on the CPU. //...
    Linux内核深入理解中断和异常(4):不可屏蔽中断NMI、浮点异常和SIMD


    rtoax
    2021年3月

    本文介绍一下几种trap:

    //* External hardware asserts (外部设备断言)the non-maskable interrupt [pin] on the CPU.
    //* The processor receives a message on the system bus or the APIC serial bus with a delivery mode `NMI`.
    #define X86_TRAP_NMI	 2	/*  2, 不可屏蔽中断 *//* Non-maskable Interrupt 不可屏蔽中断, 严重问题 */
                                /**
                                 *  hardware interrupt
                                 *      exc_nmi  : arch\x86\kernel\nmi.c
                                 *      ()
                                 */
                                 
    #define X86_TRAP_BR		 5	/*  5, 超出范围 *//* Bound Range Exceeded */
                                /**
                                 *      exc_bounds  : arch\x86\kernel\traps.c
                                 *      ()
                                 */
    #define X86_TRAP_MF		16	/* 16, x87 浮点异常 *//* x87 Floating-Point Exception */
                                /**
                                 *      exc_coprocessor_error  : arch\x86\kernel\traps.c
                                 *      ()
                                 */                             
    #define X86_TRAP_XF		19	/* 19, SIMD (单指令多数据结构浮点)异常 *//* SIMD Floating-Point Exception */
                                /**
                                 *      exc_simd_coprocessor_error  : arch\x86\kernel\traps.c
                                 *      ()
                                 */
                                /* `SSE` or `SSE2` or `SSE3` SIMD floating-point exception */
                                //There are six classes of numeric exception conditions that 
                                //can occur while executing an SIMD floating-point instruction:
                                //
                                //* Invalid operation
                                //* Divide-by-zero
                                //* Denormal operand
                                //* Numeric overflow
                                //* Numeric underflow
                                //* Inexact result (Precision)                             
    

    首先看下X86_TRAP_NMI出现的位置:

    arch/x86/mm/extable.c:194:	if (trapnr == X86_TRAP_NMI)
    arch/x86/kernel/idt.c:77:	INTG(X86_TRAP_NMI,		asm_exc_nmi),   //arch/x86/entry/entry_64.S
    arch/x86/kernel/idt.c:238:	ISTG(X86_TRAP_NMI,	asm_exc_nmi,			IST_INDEX_NMI), arch/x8
    6/entry/entry_64.Sarch/x86/platform/uv/uv_nmi.c:905:		ret = kgdb_nmicallin(cpu, X86_TRAP_NMI, regs, reason,
    arch/x86/include/asm/trapnr.h:52:#define X86_TRAP_NMI	 2	/*  2, 不可屏蔽中断 *//* Non-maskable Interrupt 不
    可屏蔽中断, 严重问题 */arch/x86/include/asm/idtentry.h:594:DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
    arch/x86/include/asm/idtentry.h:596:DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
    

    1. Non-maskable interrupt handler

    It is sixth part of the Interrupts and Interrupt Handling in the Linux kernel chapter and in the previous part we saw implementation of some exception handlers for the General Protection Fault exception, divide exception, invalid opcode exceptions and etc. As I wrote in the previous part we will see implementations of the rest exceptions in this part. We will see implementation of the following handlers:

    in this part. So, let’s start.

    2. Non-Maskable interrupt handling

    A Non-Maskable interrupt is a hardware interrupt that cannot be ignored by standard masking techniques. In a general way, a non-maskable interrupt can be generated in either of two ways:

    • **External hardware asserts (外部设备断言)**the non-maskable interrupt pin on the CPU.
    • The processor receives a message on the system bus or the APIC serial bus with a delivery mode NMI.

    When the processor receives a NMI from one of these sources, the processor handles it immediately by calling the NMI handler pointed to by interrupt vector which has number 2 (see table in the first part).

    #define X86_TRAP_NMI	 2
    

    We already filled the Interrupt Descriptor Table with the vector number, address of the nmi interrupt handler and NMI_STACK Interrupt Stack Table entry:

    set_intr_gate_ist(X86_TRAP_NMI, &nmi, NMI_STACK);
    

    5.10.13中:

    static const __initconst struct idt_data def_idts[] = {/* 默认的 中断描述符表 */
    	...
    	INTG(X86_TRAP_NMI,		asm_exc_nmi),   //arch/x86/entry/entry_64.S
    	...
    };
    

    in the trap_init function which defined in the arch/x86/kernel/traps.c source code file. In the previous parts we saw that entry points of the all interrupt handlers are defined with the:

    .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
    ENTRY(\sym)
    ...
    ...
    ...
    END(\sym)
    .endm
    

    macro from the arch/x86/entry/entry_64.S assembly source code file. But the handler of the Non-Maskable interrupts is not defined with this macro. It has own entry point:

    ENTRY(nmi)
    ...
    ...
    ...
    END(nmi)
    

    in the same arch/x86/entry/entry_64.S assembly file. Lets dive into it and will try to understand how Non-Maskable interrupt handler works. The nmi handlers starts from the call of the:

    PARAVIRT_ADJUST_EXCEPTION_FRAME
    

    macro but we will not dive into details about it in this part, because this macro related to the Paravirtualization stuff which we will see in another chapter. After this save the content of the rdx register on the stack:

    pushq	%rdx
    

    And allocated check that cs was not the kernel segment when an non-maskable interrupt occurs:

    cmpl	$__KERNEL_CS, 16(%rsp)
    jne	first_nmi
    

    The __KERNEL_CS macro defined in the arch/x86/include/asm/segment.h and represented second descriptor in the Global Descriptor Table:

    #define GDT_ENTRY_KERNEL_CS	2
    #define __KERNEL_CS	(GDT_ENTRY_KERNEL_CS*8)
    

    more about GDT you can read in the second part of the Linux kernel booting process chapter. If cs is not kernel segment, it means that it is not nested NMI and we jump on the first_nmi label. Let’s consider this case. First of all we put address of the current stack pointer to the rdx and pushes 1 to the stack in the first_nmi label:

    first_nmi:
    	movq	(%rsp), %rdx
    	pushq	$1
    

    Why do we push 1 on the stack?

    5.10.13中改成了0

    first_nmi:
    	/* Restore rdx. */
    	movq	(%rsp), %rdx
    
    	/* Make room for "NMI executing". */
    	pushq	$0
    

    总的来说是为了解决在处理NMI期间又来了一个NMI。

    As the comment says: We allow breakpoints in NMIs. On the x86_64, like other architectures, the CPU will not execute another NMI until the first NMI is completed. A NMI interrupt finished with the iret instruction like other interrupts and exceptions do it. If the NMI handler triggers either a page fault or breakpoint or another exception which are use iret instruction too. If this happens while in NMI context, the CPU will leave NMI context and a new NMI may come in.

    The iret used to return from those exceptions will re-enable NMIs and we will get nested non-maskable interrupts. The problem the NMI handler will not return to the state that it was, when the exception triggered, but instead it will return to a state that will allow new NMIs to preempt the running NMI handler.

    If another NMI comes in before the first NMI handler is complete, the new NMI will write all over the preempted NMIs stack. **We can have nested NMIs where the next NMI is using the top of the stack of the previous NMI. It means that we cannot execute it because a nested non-maskable interrupt will corrupt stack of a previous non-maskable interrupt. **

    当NMIs嵌套,下一个NMI使用上一个NMI的栈顶,那么我们就不能执行它,因为嵌套的NMI将摧毁上一个NMI的栈。

    That’s why we have allocated space on the stack for temporary variable. We will check this variable that it was set when a previous NMI is executing and clear if it is not nested NMI. We push 1 here to the previously allocated space on the stack to denote that a non-maskable interrupt executed currently. Remember that when and NMI or another exception occurs we have the following stack frame:

    +------------------------+
    |         SS             |
    |         RSP            |
    |        RFLAGS          |
    |         CS             |
    |         RIP            |
    +------------------------+
    

    and also an error code if an exception has it. So, after all of these manipulations our stack frame will look like this:

    +------------------------+
    |         SS             |
    |         RSP            |
    |        RFLAGS          |
    |         CS             |
    |         RIP            |
    |         RDX            |
    |          1             |
    +------------------------+
    

    In the next step we allocate yet another 40 bytes on the stack:

    subq	$(5*8), %rsp
    

    and pushes the copy of the original stack frame after the allocated space:

    .rept 5
    pushq	11*8(%rsp)
    .endr
    

    with the .rept assembly directive. We need in the copy of the original stack frame. Generally we need in two copies of the interrupt stack.

    • First is copied interrupts stack: saved stack frame and copied stack frame. Now we pushes original stack frame to the saved stack frame which locates after the just allocated 40 bytes (copied stack frame). This stack frame is used to fixup the copied stack frame that a nested NMI may change.
    • The second - copied stack frame modified by any nested NMIs to let the first NMI know that we triggered a second NMI and we should repeat the first NMI handler. Ok, we have made first copy of the original stack frame, now time to make second copy:
    addq	$(10*8), %rsp
    
    .rept 5
    pushq	-6*8(%rsp)
    .endr
    subq	$(5*8), %rsp
    

    After all of these manipulations our stack frame will be like this:

    +-------------------------+
    | original SS             |
    | original Return RSP     |
    | original RFLAGS         |
    | original CS             |
    | original RIP            |
    +-------------------------+
    | temp storage for rdx    |
    +-------------------------+
    | NMI executing variable  |
    +-------------------------+
    | copied SS               |
    | copied Return RSP       |
    | copied RFLAGS           |
    | copied CS               |
    | copied RIP              |
    +-------------------------+
    | Saved SS                |
    | Saved Return RSP        |
    | Saved RFLAGS            |
    | Saved CS                |
    | Saved RIP               |
    +-------------------------+
    

    After this we push dummy error code on the stack as we did it already in the previous exception handlers and allocate space for the general purpose registers on the stack:

    pushq	$-1
    ALLOC_PT_GPREGS_ON_STACK
    

    We already saw implementation of the ALLOC_PT_GREGS_ON_STACK macro in the third part of the interrupts chapter. This macro defined in the arch/x86/entry/calling.h and yet another allocates 120 bytes on stack for the general purpose registers, from the rdi to the r15:

    .macro ALLOC_PT_GPREGS_ON_STACK addskip=0
    addq	$-(15*8+\addskip), %rsp
    .endm
    

    After space allocation for the general registers we can see call of the paranoid_entry:

    call	paranoid_entry
    

    We can remember from the previous parts this label. It pushes general purpose registers on the stack, reads MSR_GS_BASE Model Specific register and checks its value. If the value of the MSR_GS_BASE is negative, we came from the kernel mode and just return from the paranoid_entry, in other way it means that we came from the usermode and need to execute swapgs instruction which will change user gs with the kernel gs:

    ENTRY(paranoid_entry)
    	cld
    	SAVE_C_REGS 8
    	SAVE_EXTRA_REGS 8
    	movl	$1, %ebx
    	movl	$MSR_GS_BASE, %ecx
    	rdmsr
    	testl	%edx, %edx
    	js	1f
    	SWAPGS
    	xorl	%ebx, %ebx
    1:	ret
    END(paranoid_entry)
    

    Note that after the swapgs instruction we zeroed the ebx register.

    Next time we will check content of this register and if we executed swapgs than ebx must contain 0 and 1 in other way. In the next step we store value of the cr2 control register to the r12 register, because the NMI handler can cause page fault and corrupt the value of this control register:

    movq	%cr2, %r12
    

    Now time to call actual NMI handler. We push the address of the pt_regs to the rdi, error code to the rsi and call the do_nmi handler:

    movq	%rsp, %rdi
    movq	$-1, %rsi
    call	do_nmi
    

    在5.10.13中:

    call	exc_nmi
    

    We will back to the do_nmi little later in this part, but now let’s look what occurs after the do_nmi will finish its execution.

    After the do_nmi handler will be finished we check the cr2 register, because we can got page fault during do_nmi performed and if we got it we restore original cr2, in other way we jump on the label 1. After this we test content of the ebx register (remember it must contain 0 if we have used swapgs instruction and 1 if we didn’t use it) and execute SWAPGS_UNSAFE_STACK if it contains 1 or jump to the nmi_restore label.

    The SWAPGS_UNSAFE_STACK macro just expands to the swapgs instruction.

    In the nmi_restore label we restore general purpose registers, clear allocated space on the stack for this registers, clear our temporary variable and exit from the interrupt handler with the INTERRUPT_RETURN macro:

    	movq	%cr2, %rcx
    	cmpq	%rcx, %r12
    	je	1f
    	movq	%r12, %cr2
    1:
    	testl	%ebx, %ebx
    	jnz	nmi_restore
    nmi_swapgs:
    	SWAPGS_UNSAFE_STACK
    nmi_restore:
    	RESTORE_EXTRA_REGS
    	RESTORE_C_REGS
    	/* Pop the extra iret frame at once */
    	REMOVE_PT_GPREGS_FROM_STACK 6*8
    	/* Clear the NMI executing stack variable */
    	movq	$0, 5*8(%rsp)
    	INTERRUPT_RETURN
    

    5.10.13中是这样的:

    nmi_restore:    //EBX contains `0`
    	POP_REGS
    
    	/*
    	 * Skip orig_ax and the "outermost" frame to point RSP at the "iret"
    	 * at the "iret" frame.
    	 */
    	addq	$6*8, %rsp
    
    	/*
    	 * Clear "NMI executing".  Set DF first so that we can easily
    	 * distinguish the remaining code between here and IRET from
    	 * the SYSCALL entry and exit paths.
    	 *
    	 * We arguably should just inspect RIP instead, but I (Andy) wrote
    	 * this code when I had the misapprehension that Xen PV supported
    	 * NMIs, and Xen PV would break that approach.
    	 */
    	std
    	movq	$0, 5*8(%rsp)		/* clear "NMI executing" */
    
    	/*
    	 * iretq reads the "iret" frame and exits the NMI stack in a
    	 * single instruction.  We are returning to kernel mode, so this
    	 * cannot result in a fault.  Similarly, we don't need to worry
    	 * about espfix64 on the way back to kernel mode.
    	 */
    	iretq
    

    where INTERRUPT_RETURN is defined in the arch/x86/include/asm/irqflags.h and just expands to the iret instruction. That’s all.

    当一个NMI未终止,另一个NMI发生会发生什么呢?

    Now let’s consider case when another NMI interrupt occurred when previous NMI interrupt didn’t finish its execution. You can remember from the beginning of this part that we’ve made a check that we came from userspace and jump on the first_nmi in this case:

    cmpl	$__KERNEL_CS, 16(%rsp)
    jne	first_nmi
    

    Note that in this case it is first NMI every time, because if the first NMI catched page fault, breakpoint or another exception it will be executed in the kernel mode. If we didn’t come from userspace, first of all we test our temporary variable:

    cmpl	$1, -8(%rsp)
    je	nested_nmi
    
    	/* This is a nested NMI. */
    
    nested_nmi: //嵌套的 NMI 开始
    	/*
    	 * Modify the "iret" frame to point to repeat_nmi, forcing another
    	 * iteration of NMI handling.
    	 */
    	subq	$8, %rsp
    	leaq	-10*8(%rsp), %rdx
    	pushq	$__KERNEL_DS
    	pushq	%rdx
    	pushfq
    	pushq	$__KERNEL_CS
    	pushq	$repeat_nmi
    
    	/* Put stack back */
    	addq	$(6*8), %rsp
    
    nested_nmi_out: //嵌套的 NMI 结束
    

    and if it is set to 1 we jump to the nested_nmi label. If it is not 1, we test the IST stack. In the case of nested NMIs we check that we are above the repeat_nmi. In this case we ignore it, in other way we check that we above than end_repeat_nmi and jump on the nested_nmi_out label.

    Now let’s look on the do_nmi exception handler. This function defined in the arch/x86/kernel/nmi.c source code file and takes two parameters:

    • address of the pt_regs;
    • error code.

    as all exception handlers.

    在5.10.13中显然已经不是这样,有些中断处理函数有errorcode,有些则没有。

    The do_nmi starts from the call of the nmi_nesting_preprocess function and ends with the call of the nmi_nesting_postprocess. The nmi_nesting_preprocess function checks that we likely do not work with the debug stack and if we on the debug stack set the update_debug_stack per-cpu variable to 1 and call the debug_stack_set_zero function from the arch/x86/kernel/cpu/common.c. This function increases the debug_stack_use_ctr per-cpu variable and loads new Interrupt Descriptor Table:

    static inline void nmi_nesting_preprocess(struct pt_regs *regs)
    {
        if (unlikely(is_debug_stack(regs->sp))) {
            debug_stack_set_zero();
            this_cpu_write(update_debug_stack, 1);
        }
    }
    

    The nmi_nesting_postprocess function checks the update_debug_stack per-cpu variable which we set in the nmi_nesting_preprocess and resets debug stack or in another words it loads origin Interrupt Descriptor Table. After the call of the nmi_nesting_preprocess function, we can see the call of the nmi_enter in the do_nmi. The nmi_enter increases lockdep_recursion field of the interrupted process, update preempt counter and informs the RCU subsystem about NMI. There is also nmi_exit function that does the same stuff as nmi_enter, but vice-versa. After the nmi_enter we increase __nmi_count in the irq_stat structure and call the default_do_nmi function. First of all in the default_do_nmi we check the address of the previous nmi and update address of the last nmi to the actual:

    if (regs->ip == __this_cpu_read(last_nmi_rip))
        b2b = true;
    else
        __this_cpu_write(swallow_nmi, false);
    
    __this_cpu_write(last_nmi_rip, regs->ip);
    

    After this first of all we need to handle CPU-specific NMIs:

    handled = nmi_handle(NMI_LOCAL, regs, b2b);
    __this_cpu_add(nmi_stats.normal, handled);
    

    And then non-specific NMIs depends on its reason:

    reason = x86_platform.get_nmi_reason();
    if (reason & NMI_REASON_MASK) {
    	if (reason & NMI_REASON_SERR)
    		pci_serr_error(reason, regs);
    	else if (reason & NMI_REASON_IOCHK)
    		io_check_error(reason, regs);
    
    	__this_cpu_add(nmi_stats.external, 1);
    	return;
    }
    

    5.10.13中不叫do_nmi,而是如下的函数:

    void exc_nmi(struct pt_regs *regs){/* 我加的 */}
    DEFINE_IDTENTRY_RAW(exc_nmi)
    {
    	bool irq_state;
    
    	/*
    	 * Re-enable NMIs right here when running as an SEV-ES guest. This might
    	 * cause nested NMIs, but those can be handled safely.
    	 */
    	sev_es_nmi_complete();
    
    	if (IS_ENABLED(CONFIG_SMP) && arch_cpu_is_offline(smp_processor_id()))
    		return;
    
    	if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {
    		this_cpu_write(nmi_state, NMI_LATCHED);
    		return;
    	}
    	this_cpu_write(nmi_state, NMI_EXECUTING);
    	this_cpu_write(nmi_cr2, read_cr2());
    nmi_restart:
    
    	/*
    	 * Needs to happen before DR7 is accessed, because the hypervisor can
    	 * intercept DR7 reads/writes, turning those into #VC exceptions.
    	 */
    	sev_es_ist_enter(regs);
    
    	this_cpu_write(nmi_dr7, local_db_save());
    
    	irq_state = idtentry_enter_nmi(regs);
    
    	inc_irq_stat(__nmi_count);
    
    	if (!ignore_nmis)
    		default_do_nmi(regs);
    
    	idtentry_exit_nmi(regs, irq_state);
    
    	local_db_restore(this_cpu_read(nmi_dr7));
    
    	sev_es_ist_exit();
    
    	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
    		write_cr2(this_cpu_read(nmi_cr2));
    	if (this_cpu_dec_return(nmi_state))
    		goto nmi_restart;
    
    	if (user_mode(regs))
    		mds_user_clear_cpu_buffers();
    }
    

    这里将其极端简化为:

    void exc_nmi(struct pt_regs *regs){
    	if (!ignore_nmis)
    		default_do_nmi(regs);
    }
    

    default_do_nmi极端简化为:

    static noinstr void default_do_nmi(struct pt_regs *regs)
    {
        nmi_handle(NMI_LOCAL, regs);
    }
    

    引入数据结构:

    struct nmiaction {
    	struct list_head	list;
    	nmi_handler_t		handler;
    	u64			max_duration;
    	unsigned long		flags;
    	const char		*name;
    };
    

    结构nmiaction被如下结构连成双向链表:

    struct nmi_desc {   /* NMI:不可屏蔽中断处理函数 链表头 */
    	raw_spinlock_t lock;
    	struct list_head head; /* struct nmiaction->list */
    };
    

    而在nmi_handle中所作的就是遍历链表,执行回调函数:

    static int nmi_handle(unsigned int type, struct pt_regs *regs)
    {
    	struct nmi_desc *desc = nmi_to_desc(type);
    	struct nmiaction *a;
    	int handled=0;
    
    	rcu_read_lock();
    
    	/*
    	 * NMIs are edge-triggered, which means if you have enough
    	 * of them concurrently, you can lose some because only one
    	 * can be latched at any given time.  Walk the whole list
    	 * to handle those situations.
    	 */
    	list_for_each_entry_rcu(a, &desc->head, list) {
    		int thishandled;
    		u64 delta;
    
    		delta = sched_clock();
    		thishandled = a->handler(type, regs);
    		handled += thishandled;
    		delta = sched_clock() - delta;
    		trace_nmi_handler(a->handler, (int)delta, thishandled);
    
    		nmi_check_duration(a, delta);
    	}
    
    	rcu_read_unlock();
    
    	/* return total number of NMI events handled */
    	return handled;
    }
    NOKPROBE_SYMBOL(nmi_handle);
    

    That’s all.

    3. Range Exceeded Exception

    Exceeded: 超过(数量); 超越(法律、命令等)的限制

    The next exception is the BOUND range exceeded exception. The BOUND instruction determines if the first operand (array index) is within the bounds of an array specified the second operand (bounds operand). If the index is not within bounds, a BOUND range exceeded exception or #BR is occurred. The handler of the #BR exception is the do_bounds function that defined in the arch/x86/kernel/traps.c.

    5.10.13中:

    void exc_bounds(struct pt_regs *regs){/* 我加的 */}
    DEFINE_IDTENTRY(exc_bounds)
    {
    	if (notify_die(DIE_TRAP, "bounds", regs, 0,
    			X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
    		return;
    	cond_local_irq_enable(regs);
    
    	if (!user_mode(regs))
    		die("bounds", regs, 0);
    
    	do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, 0, 0, NULL);
    
    	cond_local_irq_disable(regs);
    }
    

    The do_bounds handler starts with the call of the exception_enter function and ends with the call of the exception_exit:

    prev_state = exception_enter();
    
    if (notify_die(DIE_TRAP, "bounds", regs, error_code,
    	           X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
        goto exit;
    ...
    ...
    ...
    exception_exit(prev_state);
    return;
    

    After we have got the state of the previous context, we add the exception to the notify_die chain and if it will return NOTIFY_STOP we return from the exception. More about notify chains and the context tracking functions you can read in the previous part. In the next step we enable interrupts if they were disabled with the contidional_sti function that checks IF flag and call the local_irq_enable depends on its value:

    conditional_sti(regs);
    
    if (!user_mode(regs))
    	die("bounds", regs, error_code);
    

    and check that if we didn’t came from user mode we send SIGSEGV signal with the die function. After this we check is MPX enabled or not, and if this feature is disabled we jump on the exit_trap label:

    if (!cpu_feature_enabled(X86_FEATURE_MPX)) {
    	goto exit_trap;
    }
    
    where we execute `do_trap` function (more about it you can find in the previous part):
    
    ```C
    exit_trap:
    	do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, NULL);
    	exception_exit(prev_state);
    

    If MPX feature is enabled we check the BNDSTATUS with the get_xsave_field_ptr function and if it is zero, it means that the MPX was not responsible for this exception:

    bndcsr = get_xsave_field_ptr(XSTATE_BNDCSR);
    if (!bndcsr)
    		goto exit_trap;
    

    After all of this, there is still only one way when MPX is responsible for this exception. We will not dive into the details about Intel Memory Protection Extensions in this part, but will see it in another chapter.

    而在5.10.13中,会调用do_trap,do_trap继续晚上上述的一系列操作。

    4. Coprocessor exception and SIMD exception

    The next two exceptions are x87 FPU Floating-Point Error exception or #MF and SIMD Floating-Point Exception or #XF. The first exception occurs when the x87 FPU has detected floating point error. For example divide by zero, numeric overflow and etc. The second exception occurs when the processor has detected SSE/SSE2/SSE3 SIMD floating-point exception. It can be the same as for the x87 FPU. The handlers for these exceptions are do_coprocessor_error and do_simd_coprocessor_error are defined in the arch/x86/kernel/traps.c and very similar on each other. They both make a call of the math_error function from the same source code file but pass different vector number. The do_coprocessor_error passes X86_TRAP_MF vector number to the math_error:

    dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
    {
    	enum ctx_state prev_state;
    
    	prev_state = exception_enter();
    	math_error(regs, error_code, X86_TRAP_MF);
    	exception_exit(prev_state);
    }
    

    and do_simd_coprocessor_error passes X86_TRAP_XF to the math_error function:

    dotraplinkage void
    do_simd_coprocessor_error(struct pt_regs *regs, long error_code)
    {
    	enum ctx_state prev_state;
    
    	prev_state = exception_enter();
    	math_error(regs, error_code, X86_TRAP_XF);
    	exception_exit(prev_state);
    }
    

    在5.10.13中:

    void exc_coprocessor_error(struct pt_regs *regs){/* 我加的 */}
    DEFINE_IDTENTRY(exc_coprocessor_error)
    {
    	math_error(regs, X86_TRAP_MF);
    }
    void exc_simd_coprocessor_error(struct pt_regs *regs){/* 我加的 */}
    DEFINE_IDTENTRY(exc_simd_coprocessor_error)
    {
    	if (IS_ENABLED(CONFIG_X86_INVD_BUG)) {
    		/* AMD 486 bug: INVD in CPL 0 raises #XF instead of #GP */
    		if (!static_cpu_has(X86_FEATURE_XMM)) {
    			__exc_general_protection(regs, 0);
    			return;
    		}
    	}
    	math_error(regs, X86_TRAP_XF);
    }
    

    First of all the math_error function defines current interrupted task, address of its fpu, string which describes an exception, add it to the notify_die chain and return from the exception handler if it will return NOTIFY_STOP:

    	struct task_struct *task = current;
    	struct fpu *fpu = &task->thread.fpu;
    	siginfo_t info;
    	char *str = (trapnr == X86_TRAP_MF) ? "fpu exception" :
    						"simd exception";
    
    	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, SIGFPE) == NOTIFY_STOP)
    		return;
    

    After this we check that we are from the kernel mode and if yes we will try to fix an exception with the fixup_exception function. If we cannot we fill the task with the exception’s error code and vector number and die:

    if (!user_mode(regs)) {
    	if (!fixup_exception(regs)) {
    		task->thread.error_code = error_code;
    		task->thread.trap_nr = trapnr;
    		die(str, regs, error_code);
    	}
    	return;
    }
    

    If we came from the user mode, we save the fpu state, fill the task structure with the vector number of an exception and siginfo_t with the number of signal, errno, the address where exception occurred and signal code:

    fpu__save(fpu);
    
    task->thread.trap_nr	= trapnr;
    task->thread.error_code = error_code;
    info.si_signo		= SIGFPE;
    info.si_errno		= 0;
    info.si_addr		= (void __user *)uprobe_get_trap_addr(regs);
    info.si_code = fpu__exception_code(fpu, trapnr);
    

    After this we check the signal code and if it is non-zero we return:

    if (!info.si_code)
    	return;
    

    Or send the SIGFPE signal in the end:

    force_sig_info(SIGFPE, &info, task);
    

    That’s all.

    5. Conclusion

    It is the end of the sixth part of the Interrupts and Interrupt Handling chapter and we saw implementation of some exception handlers in this part, like non-maskable interrupt, SIMD and x87 FPU floating point exception. Finally we have finsihed with the trap_init function in this part and will go ahead in the next part. The next our point is the external interrupts and the early_irq_init function from the init/main.c.

    If you have any questions or suggestions write me a comment or ping me at twitter.

    Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to linux-insides.

    6. Links

    展开全文
  • 聚类效果评价指标:MI, NMI, AMI 简介 在无监督学习中,常见的两种任务为聚类与降维。这里给出三个聚类效果评价指标:互信息,标准化互信息,调整互信息(MI, NMI, AMI),分别给出它们的计算方法与代码。需要指出的是...

    聚类效果评价指标:MI, NMI, AMI(互信息,标准化互信息,调整互信息)

    简介

    在无监督学习中,常见的两种任务为聚类与降维。这里给出三个聚类效果评价指标:互信息,标准化互信息,调整互信息(MI, NMI, AMI),分别给出它们的计算方法与代码。需要指出的是,这三个指标均需要已知数据点的真实标签。

    Preliminaries and Notation

    已知 N N N D D D 维的数据,构成数据矩阵 X = [ x 1 , x 2 , ⋯   , x N ] ∈ R D × N \mathbf{X} = [x_1, x_2, \cdots, x_N] \in \mathbb{R}^{D\times N} X=[x1,x2,,xN]RD×N

    对每个数据 x i ∈ R D × 1 x_i \in \mathbb{R}^{D\times 1} xiRD×1 ,其对应标签(label)为 u i ∈ R u_i \in \mathbb{R} uiR

    标签中共有 R R R 个不同的取值,或者说数据共有 R R R 类,其对应的标签组成向量 U = [ u 1 , u 2 , ⋯   , u N ] ∈ R N U = [u_1, u_2, \cdots, u_N] \in \mathbb{R}^N U=[u1,u2,,uN]RN

    我们取自然数 1 1 1 N N N 来为这 N N N 个数据点依次编号,并取 S = { 1 , 2 , ⋯   , N } S = \left\{1, 2, \cdots, N\right\} S={1,2,,N} ,则有
    S = ∪ i = 1 R U i = { U 1 , U 2 , ⋯   , U R } S = \mathop{\cup}\limits_{i=1}^R U_i = \left\{U_1, U_2, \cdots, U_R\right\} S=i=1RUi={U1,U2,,UR}
    其中 U i U_i Ui 代表归属于第 i i i 类的数据集合;如对向量 U = [ 1 , 1 , 2 , 2 , 3 , 3 , 3 ] U = [1, 1, 2, 2, 3, 3, 3] U=[1,1,2,2,3,3,3] ,则有
    N = 7 , R = 3 , U 1 = { 1 , 2 } , U 2 = { 3 , 4 } , U 3 = { 5 , 6 , 7 } N = 7, R = 3, U_1 = \left\{1, 2\right\}, U_2 = \left\{3, 4\right\}, U_3 = \left\{5, 6, 7 \right\} N=7,R=3,U1={1,2},U2={3,4},U3={5,6,7}

    这里需要指出的是,如果整个数据集共有5类,则我们默认用自然数1至5来代表这5类;

    对于 U U U 而言,其包含 R R R 类,则用自然数 1 1 1 R R R 来代表这 R R R 个类别。

    所谓聚类问题,即为对原始数据集 X \mathbf{X} X 进行一种划分;通过某种聚类算法(如DBSCAN、谱聚类等),我们可以得到一个对 X \mathbf{X} X 的划分,即 V = [ v 1 , v 2 , ⋯   , v N ] ∈ R N V = [v_1, v_2, \cdots, v_N] \in \mathbb{R}^N V=[v1,v2,,vN]RN

    类似地,我们有
    S = ∪ i = 1 C V i = { V 1 , V 2 , ⋯   , V C } S = \mathop{\cup}\limits_{i=1}^C V_i = \left\{V_1, V_2, \cdots, V_C \right\} S=i=1CVi={V1,V2,,VC}
    其中 V i V_i Vi 代表聚类结果中归属于第 i i i 类的数据集合。

    信息熵与列联表(contingency table)

    在介绍互信息之前,首先介绍一下香农熵(信息熵)和列联表(contingency table)。

    对上述标签向量 U ∈ R N U \in \mathbb{R}^N URN ,其香农熵(信息熵)可以被计算为:
    H ( U ) = − ∑ i = 1 R p i log ⁡ p i (1) \text{H}(U) = -\sum_{i=1}^R p_i \log p_i \tag{1} H(U)=i=1Rpilogpi(1)
    其中对数函数的底常取 2 2 2 或自然对数 e e e

    p i p_i pi 为归属于第 i i i 类的数据个数占数据总量的比例,即
    p i = ∣ U i ∣ N (2) p_i = \frac{\left| U_i \right|}{N} \tag{2} pi=NUi(2)

    如对向量 U = [ 1 , 1 , 1 ] U = [1, 1, 1] U=[1,1,1] ,则有 p 1 = 1 p_1 = 1 p1=1 H ( U ) = − 1 ⋅ log ⁡ ( 1 ) = 0 \text{H} (U) = - 1 \cdot \log(1) = 0 H(U)=1log(1)=0

    对向量 U = [ 1 , 2 , 3 ] U = [1, 2, 3] U=[1,2,3] ,则有 p 1 = p 2 = p 3 = 1 3 p_1 = p_2 = p_3 = \frac{1}{3} p1=p2=p3=31 H ( U ) = − 3 ⋅ 1 3 ⋅ log ⁡ ( 1 3 ) = log ⁡ 3 \text{H} (U) = - 3 \cdot \frac{1}{3} \cdot \log(\frac{1}{3}) = \log 3 H(U)=331log(31)=log3

    对向量 U = [ 1 , 2 , 2 ] U = [1, 2, 2] U=[1,2,2] ,则有 p 1 = 1 3 , p 2 = 2 3 p_1 = \frac{1}{3}, p_2 = \frac{2}{3} p1=31,p2=32 H ( U ) = − 1 3 ⋅ log ⁡ ( 1 3 ) − 2 3 ⋅ log ⁡ ( 2 3 ) = log ⁡ 3 − 2 3 ⋅ log ⁡ 2 \text{H} (U) = - \frac{1}{3} \cdot \log(\frac{1}{3}) - \frac{2}{3} \cdot \log(\frac{2}{3}) = \log 3 - \frac{2}{3} \cdot \log2 H(U)=31log(31)32log(32)=log332log2

    不难发现,信息熵是非负的,因为 0 < p i ≤ 1 0 < p_i \leq 1 0<pi1

    我们取矩阵 M ∈ R R × C M \in \mathbb{R}^{R\times C} MRR×C 为真实标签向量 U U U 与预测标签向量 V V V 的列联表(contingency table),满足
    m i j = ∣ U i ∩ V j ∣ (3) m_{ij} = \left|U_i \cap V_j\right| \tag{3} mij=UiVj(3)
    其中 i = 1 , 2 , ⋯   , R i = 1, 2, \cdots, R i=1,2,,R j = 1 , 2 , ⋯   , C j = 1, 2, \cdots, C j=1,2,,C

    如对向量 U = [ 1 , 1 , 2 , 2 ] , V = [ 1 , 1 , 1 , 2 ] U = [1, 1, 2, 2], V = [1, 1, 1, 2] U=[1,1,2,2],V=[1,1,1,2] ,有
    U 1 = { 1 , 2 } U 2 = { 3 , 4 } V 1 = { 1 , 2 , 3 } V 2 = { 4 } \begin{aligned}U_1 &= \left\{1, 2\right\} & U_2 &= \left\{3, 4\right\} \\V_1 &= \left\{1, 2, 3\right\} & V_2 &= \left\{4\right\}\end{aligned} U1V1={1,2}={1,2,3}U2V2={3,4}={4}
    因此,
    m 11 = ∣ U 1 ∩ V 1 ∣ = ∣ { 1 , 2 } ∣ = 2 m 12 = ∣ U 1 ∩ V 2 ∣ = ∣ ∅ ∣ = 0 m 21 = ∣ U 2 ∩ V 1 ∣ = ∣ { 3 } ∣ = 1 m 22 = ∣ U 2 ∩ V 2 ∣ = ∣ { 4 } ∣ = 1 \begin{aligned} m_{1 1} & = \left|U_1 \cap V_1\right| = \left|\left\{1, 2\right\}\right| = 2 \\ m_{1 2} & = \left|U_1 \cap V_2\right| = \left|\varnothing\right| = 0 \\ m_{2 1} & = \left|U_2 \cap V_1\right| = \left|\left\{3 \right\}\right| = 1 \\ m_{2 2} & = \left|U_2 \cap V_2\right| = \left|\left\{4 \right\}\right| = 1 \\ \end{aligned} m11m12m21m22=U1V1={1,2}=2=U1V2==0=U2V1={3}=1=U2V2={4}=1

    列联表为

    M = [ 2 0 1 1 ] M = \left[ \begin{array}{l} 2 & 0 \\ 1 & 1 \end{array} \right] M=[2101]

    互信息(Mutual information)

    互信息的计算公式如下:

    MI ( U , V ) = ∑ i = 1 R ∑ j = 1 C p i , j log ⁡ ( p i , j p i × p j ) (4) \text{MI}(U, V) = \sum_{i = 1}^R\sum_{j=1}^C p_{i,j} \log\left(\frac{p_{i, j}}{p_i \times p_j}\right) \tag{4} MI(U,V)=i=1Rj=1Cpi,jlog(pi×pjpi,j)(4)
    其中
    p i , j = ∣ U i ∩ V j ∣ N = m i j N (5) p_{i,j} = \frac{\left|U_i \cap V_j \right|}{N} = \frac{m_{ij}}{N} \tag{5} pi,j=NUiVj=Nmij(5)

    p i = ∣ U i ∣ N , p j = ∣ V j ∣ N p_i = \frac{\left| U_i \right|}{N}, p_j = \frac{\left| V_j \right|}{N} pi=NUi,pj=NVj

    如对向量 U = [ 1 , 1 , 2 , 2 ] , V = [ 1 , 1 , 1 , 2 ] U = [1, 1, 2, 2], V = [1, 1, 1, 2] U=[1,1,2,2],V=[1,1,1,2] ,有
    p 1 , 1 = m 11 / N = 0.5 p 1 , 2 = m 12 / N = 0 p 2 , 1 = m 21 / N = 0.25 p 2 , 2 = m 22 / N = 0.25 \begin{aligned} p_{1, 1} & = m_{11} / N = 0.5 \\ p_{1, 2} & = m_{12} / N = 0 \\ p_{2, 1} & = m_{21} / N = 0.25 \\ p_{2, 2} & = m_{22} / N = 0.25 \\ \end{aligned} p1,1p1,2p2,1p2,2=m11/N=0.5=m12/N=0=m21/N=0.25=m22/N=0.25

    标准化互信息(NMI, Normalized Mutual Information)

    通常采用NMI和AMI来作为衡量聚类效果的指标。

    标准化互信息的计算方法如下:

    NMI ( U , V ) = MI ( U , V ) F ( H ( U ) , H ( V ) ) (6) \text{NMI}(U, V) = \frac{\text{MI}(U, V)}{F\left(\text{H}\left(U\right), \text{H}\left(V\right)\right)} \tag{6} NMI(U,V)=F(H(U),H(V))MI(U,V)(6)

    其中 F ( x 1 , x 2 ) F(x_1, x_2) F(x1,x2) 可以为 min ⁡ \min min/ max ⁡ \max max 函数;可以为几何平均,即 F ( x 1 , x 2 ) = x 1 x 2 F(x_1, x_2) = \sqrt{x_1x_2} F(x1,x2)=x1x2 ;可以为算术平均,即 F ( x 1 , x 2 ) = x 1 + x 2 2 F(x_1, x_2) = \frac{x_1 + x_2}{2} F(x1,x2)=2x1+x2

    通常我们选取算术平均,则标准化互信息即可被计算为
    NMI ( U , V ) = 2 MI ( U , V ) H ( U ) + H ( V ) (7) \text{NMI}(U, V) = 2 \frac{\text{MI}(U, V)}{\text{H}\left(U\right) + \text{H}\left(V\right)} \tag{7} NMI(U,V)=2H(U)+H(V)MI(U,V)(7)

    调整互信息(AMI, Adjusted Mutual Information)

    调整互信息的计算要复杂一些,其计算方法如下:

    AMI ( U , V ) = MI ( U , V ) − E { MI ( U , V ) } F ( H ( U ) , H ( V ) ) − E { MI ( U , V ) } (8) \text{AMI}(U, V) = \frac{\text{MI}(U, V) - \mathbb E\left\{ \text{MI}(U, V) \right\}}{F\left(\text{H}\left(U\right), \text{H}\left(V\right)\right) - \mathbb E\left\{ \text{MI}(U, V) \right\}} \tag{8} AMI(U,V)=F(H(U),H(V))E{MI(U,V)}MI(U,V)E{MI(U,V)}(8)

    其中, E { MI ( U , V ) } \mathbb E\left\{ \text{MI}(U, V) \right\} E{MI(U,V)} 为互信息 MI ( U , V ) \text{MI}(U, V) MI(U,V) 的期望,计算方法为
    E { MI ( U , V ) } = ∑ i = 1 R ∑ j = 1 C ∑ k = ( a i + b j − N ) + min ⁡ ( a i , b j ) k N log ⁡ ( N × k a i × b j ) a i ! b j ! ( N − a i ) ! ( N − b j ) ! N ! k ! ( a i − k ) ! ( b j − k ) ! ( N − a i − b j + k ) ! (9) \begin{aligned} \mathbb E\left\{ \text{MI}(U, V) \right\} &= \\ \sum_{i=1}^R \sum_{j=1}^C &\sum_{k = \left(a_i + b_j - N \right)^+}^{\min \left(a_i, b_j\right)} \frac{k}{N} \log\left(\frac{N \times k}{a_i \times b_j}\right)\frac{a_i!b_j!\left(N - a_i\right)!\left(N - b_j\right)!}{N!k!\left(a_i - k\right)!\left(b_j - k\right)!\left(N - a_i - b_j + k\right)!} \end{aligned} \tag{9} E{MI(U,V)}i=1Rj=1C=k=(ai+bjN)+min(ai,bj)Nklog(ai×bjN×k)N!k!(aik)!(bjk)!(Naibj+k)!ai!bj!(Nai)!(Nbj)!(9)
    其中 ( a i + b j − N ) + \left(a_i + b_j - N \right)^+ (ai+bjN)+ max ⁡ ( 1 , a i + b j − N ) \max \left(1, a_i + b_j - N \right) max(1,ai+bjN)

    a i , b j a_i, b_j ai,bj 分别为列联表 M M M 的第 i i i 行和与第 j j j 列和,具体为
    a i = ∑ j = 1 C m i j b j = ∑ i = 1 R m i j (10) \begin{aligned} a_i = \sum_{j=1}^C m_{ij} \\ b_j = \sum_{i=1}^R m_{ij} \end{aligned} \tag{10} ai=j=1Cmijbj=i=1Rmij(10)
    如果我们选取函数 F ( x 1 , x 2 ) F(x_1, x_2) F(x1,x2) max ⁡ \max max 函数,则调整互信息可被计算为
    AMI ( U , V ) = MI ( U , V ) − E { MI ( U , V ) } max ⁡ ( H ( U ) , H ( V ) ) − E { MI ( U , V ) } (11) \text{AMI}(U, V) = \frac{\text{MI}(U, V) - \mathbb E\left\{ \text{MI}(U, V) \right\}}{\max\left(\text{H}\left(U\right), \text{H}\left(V\right)\right) - \mathbb E\left\{ \text{MI}(U, V) \right\}} \tag{11} AMI(U,V)=max(H(U),H(V))E{MI(U,V)}MI(U,V)E{MI(U,V)}(11)

    如果我们选取函数 F ( x 1 , x 2 ) F(x_1, x_2) F(x1,x2) 为几何平均,则调整互信息可被计算为

    AMI ( U , V ) = MI ( U , V ) − E { MI ( U , V ) } H ( U ) ⋅ H ( V ) − E { MI ( U , V ) } (12) \text{AMI}(U, V) = \frac{\text{MI}(U, V) - \mathbb E\left\{ \text{MI}(U, V) \right\}}{\sqrt{\text{H}\left(U\right) \cdot \text{H}\left(V\right)} - \mathbb E\left\{ \text{MI}(U, V) \right\}} \tag{12} AMI(U,V)=H(U)H(V) E{MI(U,V)}MI(U,V)E{MI(U,V)}(12)

    如果我们选取函数 F ( x 1 , x 2 ) F(x_1, x_2) F(x1,x2) 为算术平均,则调整互信息可被计算为

    AMI ( U , V ) = MI ( U , V ) − E { MI ( U , V ) } 1 2 ( H ( U ) + H ( V ) ) − E { MI ( U , V ) } (13) \text{AMI}(U, V) = \frac{\text{MI}(U, V) - \mathbb E\left\{ \text{MI}(U, V) \right\}}{\frac{1}{2}\left(\text{H}\left(U\right) + \text{H}\left(V\right)\right) - \mathbb E\left\{ \text{MI}(U, V) \right\}} \tag{13} AMI(U,V)=21(H(U)+H(V))E{MI(U,V)}MI(U,V)E{MI(U,V)}(13)

    编程实现

    Python中的 sklearn 库里有这三个指标的类,可以直接调用;

    matlab中似乎没有找到现成的包,因此自己编写。

    python

    这里NMI和AMI的计算均采用算术平均; log ⁡ \log log 函数的底为自然对数 e e e

    from sklearn.metrics.cluster import entropy, mutual_info_score, normalized_mutual_info_score, adjusted_mutual_info_score
    
    MI = lambda x, y: mutual_info_score(x, y)
    NMI = lambda x, y: normalized_mutual_info_score(x, y, average_method='arithmetic')
    AMI = lambda x, y: adjusted_mutual_info_score(x, y, average_method='arithmetic')
    
    A = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
    B = [1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 1, 1, 3, 3, 3]
    #print(entropy(A))
    #print(MI(A, B))
    print(NMI(A, B))
    print(AMI(A, B))
    
    C = [1, 1, 2, 2, 3, 3, 3]
    D = [1, 1, 1, 2, 1, 1, 1]
    print(NMI(C, D))
    print(AMI(C, D))
    

    matlab

    这里的NMI与AMI的代码实现均采用的算术平均, log ⁡ \log log 函数的底采用自然对数 e e e

    函数文件:

    NMI_AMI.m

    function [NMI, AMI] = NMI_AMI(X, Y)
    %NMI_AMI return NMI, AMI
    % MI: mutual information
    % H; entropy
    % NMI: normalized mutual infomation
    % AMI: adjusted mutual infomation
    % NMI(X, Y) = MI(X, Y) / F(H(X), H(Y))
    % AMI(X, Y) = (MI(X, Y) - EMI(X, Y)) / (F(H(X) + H(Y)) - EMI(X, Y))
    % F(x, y) is a function, can be "mean", "max", "geometric", "arithmetic"
    % here we both use arithmetric
    
    NMI = 2 * MI(X, Y) / (H(X) + H(Y));
    
    AMI = (MI(X, Y) - EMI(X, Y)) / (1/2 * (H(X) + H(Y)) - EMI(X, Y));
    
    end
    
    
    
    function [res] = MI(X, Y)
    %MI mutual infomation
    
    n = length(X);
    X_list = unique(X);
    Y_list = unique(Y);
    res = 0;
    for x = X_list
        for y = Y_list
            loc_x = find(X == x);
            loc_y = find(Y == y);
            loc_xy = intersect(loc_x, loc_y);
            res = res + length(loc_xy) / n * log(length(loc_xy) / n / ((length(loc_x) / n) * (length(loc_y) / n)) + eps);
        end
    end
    
    end
    
    
    
    function [res] = H(X)
    %H information entropy
    
    n = length(X);
    X_list = unique(X);
    res = 0;
    
    for x = X_list
        loc = find(X == x);
        px = length(loc) / n;
        res = res - px * log(px);
    end
    
    end
    
    
    
    function [res] = f(a, b)
    % F calculate a! / b!
    % sometimes a and b can be very large, hence, directly calculate a! or b! is not
    % suitable; but maybe a-b is small; 
    % a,b should both be positive integers
    
    res = 1;
    if a > b
        for i = b+1:a
            res = res * i;
        end
    elseif a < b
        for i = a+1:b
            res = res / i;
        end
    else
        res = 1;
    end
    
    end
    
    
    
    function [res] = EMI(U, V)
    % EMI expected mutual information, E[MI(X, Y)]
    
    N = length(U);
    
    U_list = unique(U);
    V_list = unique(V);
    R = length(U_list);
    C = length(V_list);
    
    M = zeros(R, C);
    for i = 1:R
        for j = 1:C
            U_loc = find(U == U_list(i));
            V_loc = find(V == V_list(j));
            M(i, j) = length(intersect(U_loc, V_loc));
        end
    end
    
    a = sum(M, 2);
    b = sum(M, 1);
    res = 0;
    
    for i = 1:R
        for j = 1:C
            for nij = max(a(i) + b(j) - N, 1):min(a(i), b(j))
                res = res + nij / N * log(N * nij / (a(i) * b(j)) + eps) * f(a(i), a(i) - nij) * f(b(j), b(j) - nij) * f(N - a(i), N) * f(N - b(j), N - a(i) - b(j) + nij) / factorial(nij);
            end
        end
    end
    
    end
    
    

    主文件(或脚本):

    clc;
    format long
    
    A = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3];
    B = [1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 1, 1, 3, 3, 3];
    [NMI, AMI] = NMI_AMI(A, B)
    
    C = [1, 1, 2, 2, 3, 3, 3];
    D = [1, 1, 1, 2, 1, 1, 1];
    [NMI, AMI] = NMI_AMI(C, D)
    

    结果对比

    python代码的运行结果为:

    0.36456177185718985

    0.26018122538925054

    0.28483386264113447

    0.056748831755324296

    matlab代码的运行结果为:

    NMI =

    0.364561771857190

    AMI =

    0.260181225389251

    NMI =

    0.284833862641135

    AMI =

    0.056748831755324

    总结与反思

    这里仅给出了三个指标的计算方法,方便直接使用;对于各个指标的优缺点、构造思想等,并没有研究。

    References

    1. https://en.wikipedia.org/wiki/Adjusted_mutual_information (科学上网)

    2. Vinh, Nguyen Xuan; Epps, Julien; Bailey, James (2010), “Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance” (PDF), The Journal of Machine Learning Research, 11 (oct): 2837–54

      论文链接:http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf

    展开全文
  • NMI标准互信息,用于评价复杂网络质量,适用于重叠社区的评价,java实现
  • NMI在芯片常见用途

    2021-09-02 11:32:24
    NMI,即non maskable interrupt,在ARM cortex M处理器中具有最高的优先级,通常被用于接收并处理一些威胁到芯片安全的事件。从目前做过的项目和学习过程中,总结了NMI的三个常见用途: 1.watchdog的计数器因为系统...
  • NMI(非屏蔽中断) DELL iDRAC

    千次阅读 2021-02-24 21:28:19
    NMI全名Non Maskable Interrupt,在DSP等学习中解释为“不可屏蔽中断”。 戴尔 iDRAC重启选择出现NMI(非屏蔽中断),不知其用途故作此笔记。 说人话:NMI(非屏蔽中断)通常要提前设置在crash dump目录,当发生硬件...
  • 聚类的评价指标NMI标准化互信息+python实现+sklearn调库概念引例公式信息熵相对熵(relative entropy)互信息*归一化互信息(NMI)代码pythonsklearn 概念 标准化互信息(normalized Mutual Information, NMI)用于...
  • 本文针对x86架构linux kernel检测hard lock的方法进行了分析。基于kernel2.6.24源码解析了从NMI中断触发到oops发生的处理流程。其使用时钟计数器判断是否发生hard lock。
  • Newport Media已开始样产一款集成了RF调谐器、解调器和所有必要存储模块的移动电视接收器IC——NMI310 “Sundance H”。据该公司称这是业界集成度最高的一款移动电视芯片。  该芯片可对采用DVB-H、DVB-T、T-DMB、...
  • 标准化互信息NMI计算步骤及其Python实现

    万次阅读 多人点赞 2017-10-28 21:37:19
    Excellence is a continuous process and not an ...标准化互信息NMI计算步骤及其Python实现 标准化互信息NMI具体定义可以参考另一篇博客: https://smj2284672469.github.io/2017/10/27/community-detection-mea
  • 代码 def nmi(X, Y): """ X:n*Kx Y:n*Ky """ X = X.T Y = Y.T def cmp(x, y): """a b c d""" a = (1 - x).dot(1 - y) d = x.dot(y) c = (1 - y).dot(x) b = (1 - x).dot(y) return a, b, c, d def h(w, n): """h(w,n...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 10,823
精华内容 4,329
关键字:

NMI