精华内容
下载资源
问答
  • IO WAIT

    2020-01-10 16:03:23
    定义:采样周期内 百分之几 属于以下情况——CPU空闲,并仍有空闲未完成的I/O请求 注:不表示CPU不能工作;不能表示I/O有瓶颈 eg1:CPU繁忙——不论I/O多少——IO WAIT 不变 ... IO并发度 低——IO WAIT ...

    定义:采样周期内 百分之几 属于以下情况——CPU空闲,并仍有空闲未完成的I/O请求

    注:不表示CPU不能工作;不能表示I/O有瓶颈

    eg1:CPU繁忙——不论I/O多少——IO WAIT 不变

              CPU繁忙下降——一部分I/O落入CPU空闲时段——IO WAIT 提升

    eg2:IO并发度 高——IO WAIT 低

              IO并发度 低——IO WAIT 高

    结论:IO WAIT升高,eg1不能证明等待IO的进程数量增多;

                                       eg2不能证明等待I/O总时长变多了;

     

    展开全文
  • io wait

    2017-08-25 11:05:42
    November 18, 2013 (updated March 3, 2015) ...Some time ago I had a discussion with some systems guys about the exact meaning of the I/O wait time which is displayed by top as a percentage of total C
        准确的说iowait只是表示在统计CPU空闲周期时间片内,有多少时间在等待IO执行,反应的是IO设备的性能。这里CPU并不会等待IO,但是相应的      操作IO的线程或者进程需要等到IO操作完成
       以下是man top 的解释
       iowait is the percentage of the time that the system was idle and at least one process was waiting for disk IO to finish.

    The precise meaning of I/O wait time in Linux
    November 18, 2013 (updated March 3, 2015)
    Some time ago I had a discussion with some systems guys about the exact meaning of the I/O wait time which is displayed by top as a percentage of total CPU time. Their answer was that it is the time spent by the CPU(s) while waiting for outstanding I/O operations to complete. Indeed, the man page for the top command defines this as the “time waiting for I/O completion”.


    However, this definition is obviously not correct (or at least not complete), because a CPU never spends clock cycles waiting for an I/O operation to complete. Instead, if a task running on a given CPU blocks on a synchronous I/O operation, the kernel will suspend that task and allow other tasks to be scheduled on that CPU.


    So what is the exact definition then? There is an interesting Server Fault question that discussed this. Somebody came up with the following definition that describes I/O wait time as a sub-category of idle time:


    iowait is time that the processor/processors are waiting (i.e. is in an idle state and does nothing), during which there in fact was outstanding disk I/O requests.
    That makes perfect sense for uniprocessor systems, but there is still a problem with that definition when applied to multiprocessor systems. In fact, “idle” is a state of a CPU, while “waiting for I/O completion” is a state of a task. However, as pointed out earlier, a task waiting for outstanding I/O operations is not running on any CPU. So how can the I/O wait time be accounted for on a per-CPU basis?


    For example, let’s assume that on an otherwise idle system with 4 CPUs, a single, completely I/O bound task is running. Will the overall I/O wait time be 100% or 25%? I.e. will the I/O wait time be 100% on a single CPU (and 0% on the others), or on all 4 CPUs? This can be easily checked by doing a simple experiment. One can simulate an I/O bound process using the following command, which will simply read data from the hard disk as fast as it can:


    dd if=/dev/sda of=/dev/null bs=1MB
    Note that you need to execute this as root and if necessary change the input file to the appropriate block device for your hard disk.


    Looking at the CPU stats in top (press 1 to get per-CPU statistics), you should see something like this:


    %Cpu0  :  3,1 us, 10,7 sy,  0,0 ni,  3,5 id, 82,4 wa,  0,0 hi,  0,3 si,  0,0 st
    %Cpu1  :  3,6 us,  2,0 sy,  0,0 ni, 90,7 id,  3,3 wa,  0,0 hi,  0,3 si,  0,0 st
    %Cpu2  :  1,0 us,  0,3 sy,  0,0 ni, 96,3 id,  2,3 wa,  0,0 hi,  0,0 si,  0,0 st
    %Cpu3  :  3,0 us,  0,3 sy,  0,0 ni, 96,3 id,  0,3 wa,  0,0 hi,  0,0 si,  0,0 st
    This output indicates that a single I/O bound task only increases the I/O wait time on a single CPU. Note that you may see that occasionally the task “switches” from one CPU to another. That is because the Linux kernel tries to schedule a task on the CPU it ran last (in order to improve CPU cache hit rates), but this is not always possible and the task is moved on another CPU. On some systems, this may occur so frequently that the I/O wait time appears to be distributed over multiple CPUs, as in the following example:


    %Cpu0  :  5.7 us,  5.7 sy,  0.0 ni, 50.5 id, 34.8 wa,  3.3 hi,  0.0 si,  0.0 st
    %Cpu1  :  3.0 us,  3.3 sy,  0.0 ni, 72.5 id, 20.9 wa,  0.3 hi,  0.0 si,  0.0 st
    %Cpu2  :  7.0 us,  4.3 sy,  0.0 ni, 62.0 id, 26.7 wa,  0.0 hi,  0.0 si,  0.0 st
    %Cpu3  :  4.3 us,  2.3 sy,  0.0 ni, 89.6 id,  3.7 wa,  0.0 hi,  0.0 si,  0.0 st
    Nevertheless, assuming that dd is the only task doing I/O on the system, there can be at most one CPU in state I/O wait at any given point in time. Indeed, 34.8+20.9+26.7+3.7=86.1 which is close to but lower than 100.


    To make the experiment more reproducible, we can use the taskset command to “pin” a process to a given CPU (Note that the first command line argument is not the CPU number, but a mask):


    taskset 1 dd if=/dev/sda of=/dev/null bs=1MB
    Another interesting experiment is to run a CPU bound task at the same time on the same CPU:


    taskset 1 sh -c "while true; do true; done"
    The I/O wait time now drops to 0 on that CPU (and also remains 0 on the other CPUs), while user and system time account for 100% CPU usage:


    %Cpu0  : 80,3 us, 15,5 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  4,3 si,  0,0 st
    %Cpu1  :  4,7 us,  3,4 sy,  0,0 ni, 91,3 id,  0,0 wa,  0,0 hi,  0,7 si,  0,0 st
    %Cpu2  :  2,3 us,  0,3 sy,  0,0 ni, 97,3 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
    %Cpu3  :  2,7 us,  4,3 sy,  0,0 ni, 93,0 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
    That is expected because I/O wait time is a sub-category of idle time, and the CPU to which we pinned both tasks is never idle.


    These findings allow us to deduce the exact definition of I/O wait time:


    For a given CPU, the I/O wait time is the time during which that CPU was idle (i.e. didn’t execute any tasks) and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request).


    Note that the nuance is not innocent and has practical consequences. For example, on a system with many CPUs, even if there is a problem with I/O performance, the observed overall I/O wait time may still be small if the problem only affects a single task. It also means that while it is generally correct to say that faster CPUs tend to increase I/O wait time (simply because a faster CPU tends to be idle more often), that statement is no longer true if one replaces “faster” by “more”.
    展开全文
  • iowait

    2018-06-17 17:27:00
    https://www.cnblogs.com/fuyuanming/articles/6497005.html

    https://www.cnblogs.com/fuyuanming/articles/6497005.html

    展开全文
  • IOwait 到底在wait什么

    2020-04-17 19:57:08
    Iowait

    %IOwait 一个迷之参数,top/iostat/mpstat/sar 都会统计的关键数据,字面理解是系统等待IO的时间,经常用于反映系统IO压力。 IOwait时候CPU在干什么?什么时间才算IOwait,到底在wait什么?

     

    数据含义及来源

    man mpstat 查看下官方定义

    %iowait
               Percentage  of  time that the CPU or CPUs were idle during which the system
               had an outstanding disk I/O request.

    统计工具对IOwait的定义是CPU idle时系统有IO请求的时间百分比,也就是说要满足两个条件才会被记录为IOwait

    1.CPU idle

    2.有IO请求在处理

    %iowait 数据来自/proc/stat 第5个参数,这里的数据又是怎么来的? 

     

    内核IOwait时间统计(kernel 5.3)

    1. /proc/stat 数据来源

    fs/proc/stat.c show_stat() 中找到/proc/stat 获取 iowait的路径 get_iowait_time()->get_cpu_iowait_time_us()。

    根据CPU当前状态 online/offline,选择从cpustat.iowait 或者get_cpu_iowait_time_us() 获取iowait

    static u64 get_iowait_time(struct kernel_cpustat *kcs, int cpu)
    {
    	u64 iowait, iowait_usecs = -1ULL;
    
    	if (cpu_online(cpu))
    		iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
    
    	if (iowait_usecs == -1ULL)
    		/* !NO_HZ or cpu offline so we can rely on cpustat.iowait */
    		iowait = kcs->cpustat[CPUTIME_IOWAIT];
    	else
    		iowait = iowait_usecs * NSEC_PER_USEC;
    
    	return iowait;
    }

    get_cpu_iowait_time_us()数据来源于每个CPU 的 ts->iowait_sleeptime

    u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
    {
    	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
    	ktime_t now, iowait;
    
    	if (!tick_nohz_active)
    		return -1;
    
    	now = ktime_get();
    	if (last_update_time) {
    		update_ts_time_stats(cpu, ts, now, last_update_time);
    		iowait = ts->iowait_sleeptime;
    	} else {
    		if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
    			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
    
    			iowait = ktime_add(ts->iowait_sleeptime, delta);
    		} else {
    			iowait = ts->iowait_sleeptime;
    		}
    	}
    	return ktime_to_us(iowait);
    }

    即iowait 来源为 cpustat.iowait  或者 CPU 相关的 ts->iowait_sleeptime 。

     

    2. cpustat.iowait/iowait_sleeptime 统计

    这两个参数的累加函数为

    void account_idle_time(u64 cputime)
    {
    	u64 *cpustat = kcpustat_this_cpu->cpustat;
    	struct rq *rq = this_rq();
    
    	if (atomic_read(&rq->nr_iowait) > 0)
    		cpustat[CPUTIME_IOWAIT] += cputime;
    	else
    		cpustat[CPUTIME_IDLE] += cputime;
    }
    static void
    update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
    {
    	ktime_t delta;
    
    	if (ts->idle_active) {
    		delta = ktime_sub(now, ts->idle_entrytime);
    		if (nr_iowait_cpu(cpu) > 0)
    			ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
    		else
    			ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
    		ts->idle_entrytime = now;
    	}
    
    	if (last_update_time)
    		*last_update_time = ktime_to_us(now);
    }

    调用栈为:

    do_idle()->tick_nohz_idle_exit()->__tick_nohz_idle_restart_tick()->tick_nohz_account_idle_ticks()->account_idle_ticks()->account_idle_time()

    do_idle()->tick_nohz_idle_exit()->tick_nohz_stop_idle()->update_ts_time_stats()

    即CPU idle时会触发计数,具体计数哪个根据cpu状态判断。

    两个数据的计数逻辑一样,都是根据当前cpu runqueue的nr_iowait 判断时间要累加在idle 或者 iowait 。

     

    3.nr_iowait 哪里统计

    调度时若当前task是in_iowait,则当前CPU runqueue 的nr_iowait 加1,表示当前有task在等待IO,参考__schedule() 。

     

    	if (!preempt && prev->state) {
    		if (signal_pending_state(prev->state, prev)) {
    			prev->state = TASK_RUNNING;
    		} else {
    			deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
    
    			if (prev->in_iowait) {
    				atomic_inc(&rq->nr_iowait);
    				delayacct_blkio_start();
    			}
    		}
    		switch_count = &prev->nvcsw;
    	}

    task的in_iowait 在 io_schedule_prepare()中设置,调用io_schedule_prepare()的相关函数有io_schedule(), io_schedule_time() , mutex_lock_io() , mutex_lock_io_nested() 等。 即当因这些调用产生调度则标记当前CPU有iowait的task,task重新wakeup时in_iowait恢复,cpu runqueu 的nr_iowait减1。

    int io_schedule_prepare(void)
    {
    	int old_iowait = current->in_iowait;
    
    	current->in_iowait = 1;
    	blk_schedule_flush_plug(current);
    
    	return old_iowait;
    }

    总的来说当有task因为IO而被调度出CPU时,标识该CPU有在等待IO的task,当CPU进入idle时如果仍有等待IO的task,就标记这段时间为IOwait,否则标记为idle, 与man mpstat中的定义一致。

     

    IO在哪里阻塞

    系统中执行IO的流程非常的多,其阻塞点也很多,这里列出两个通常IO操作中最常见的阻塞点。

    1. io提交至驱动后,等待数据返回。

    2. 有并发IO请求时争抢IO软/硬队列等资源。

    用 kprobe 观察两种测试场景,单线程IO/多线程IO,的io_schedule调用栈

    单线程read,仅有一种调用栈

                 fio-7834  [001] d... 875382.127151: io_schedule: (io_schedule+0x0/0x40)
                 fio-7834  [001] d... 875382.127163: <stack trace>
     => io_schedule
     => ext4_file_read_iter
     => new_sync_read
     => __vfs_read
     => vfs_read
     => ksys_pread64
     => sys_pread64
     => ret_fast_syscall

    多线程read,多了因争抢IO资源产生的io调度

                fio-9800  [001] d... 875471.769845: io_schedule: (io_schedule+0x0/0x40)
                 fio-9800  [001] d... 875471.769858: <stack trace>
     => io_schedule
     => ext4_file_read_iter
     => new_sync_read
     => __vfs_read
     => vfs_read
     => ksys_pread64
     => sys_pread64
     => ret_fast_syscall
     => 0xbe9445a8
                 fio-9801  [003] d... 875471.770153: io_schedule: (io_schedule+0x0/0x40)
                 fio-9801  [003] d... 875471.770164: <stack trace>
     => io_schedule
     => blk_mq_get_request
     => blk_mq_make_request
     => generic_make_request
     => submit_bio
     => ext4_mpage_readpages
     => ext4_readpages
     => read_pages
     => __do_page_cache_readahead
     => force_page_cache_readahead
     => page_cache_sync_readahead
     => generic_file_read_iter
     => ext4_file_read_iter
     => new_sync_read
     => __vfs_read
     => vfs_read
     => ksys_pread64
     => sys_pread64
     => ret_fast_syscall
     => 0xbe9445a8

     

    结论

    IOwait 是指CPU空闲时,且当前有task在等待IO的时间。

    因IO阻塞而调度主要出现在 1.等待数据返回; 2.并发IO时竞争资源

    影响该时间的因素很多,不只有IO负载,CPU负载也会严重影响该参数,不可一味根据IOwait判断系统的IO压力,还需要结合iostat 等数据综合判断。

     

     

    展开全文
  • scope-iowait-源码

    2021-05-12 10:21:11
    范围IOWait插件 Scope IOWait插件是一个GO应用程序,它使用在 UI中提供主机级CPU IO等待或空闲指标。 如何运行范围IOWait插件 Scope IOWait插件可以单独执行。 它将以JSON格式响应/var/run/scope/plugins/iowait/io...
  • 定位iowait问题

    千次阅读 2018-05-17 18:57:48
    iowait? 是系统因为io导致的进程wait。系统在做io,导致没有进程在干活,cpu在执行idle进程空转,所以iowait的产生要满足两个条件,一是进程在等io,二是等io时没有进程可运行。 iowait实际测量的是cpu时间 io...
  • Linux high IOwait is a common Linux performance issue. Today we will look at what iowait means and what contributes to this problem. Hope this can give you more ideas about high IOwait issue. What is ...
  • 解读linux iowait

    2020-11-30 21:43:06
    %iowait 是使用linux检测工具(iostat、sar 等)检查CPU使用率时显示的一个指标,在一些Unix版本上显示为 %wio。这个指标常常被误读,很多人把它当作I/O出现瓶颈的参考指标,我自己每隔一段时间就会遇到对 %iowait ...
  • re seeing some issues with a Shoryuken based system generating lots of IO wait - is that normal? To put this in context, we're routinely seeing 90%+ IO wait on the servers running Shoryuken. I...
  • iowait 的常见误解

    2019-09-27 19:04:49
    %iowait 是 “sar -u” 等工具检查CPU使用率时显示的一个指标,在Linux上显示为 %iowait,在有的Unix版本上显示为 %wio,含义都是一样的。这个指标常常被误读,很多人把它当作I/O问题的征兆,我自己每隔一段时间就会...
  • High IOwait on managers

    2020-11-29 09:44:21
    re getting high IOwait values on our managers (both version 3.8.1 and the recently upgraded 3.10.2) In our datacenter with high speed SSD drives, we are getting around 20% IOwait. It was higher (40%) ...
  • %IOWAIT深入了解

    2017-10-20 15:43:41
    之前一直以为、top看到%IOWAIT就是有问题,后来跟之前的同事探讨的时候,有了一个新的认识。%iowait 是 “sar -u” 等工具检查CPU使用率时显示的一个指标,在Linux上显示为 %iowait,在有的Unix版本上显示为 %wio,...
  • iostat和iowait详细解说

    2017-11-12 21:22:00
    iostat和iowait详细解说 %iowait并不能反应磁盘瓶颈 iowait实际测量的是cpu时间: %iowait = (cpu idle time)/(all cpu time) 这个文章说明:高速cpu会造成很高的iowait值,但这并不代表磁盘是系统的瓶颈。唯一能...
  • iowait

    千次阅读 2013-02-22 11:36:02
    当时场景是VM上iowait和%util都是100%。 物理机稍微好点, iowait 30%~50%, %util也近100% 后面实在不行就对物理机进行重启,启动进不去系统,准备重装时发现硬盘灯有黄灯。由于有raid,把坏硬盘拔掉,就工作正常...
  • iowait过高问题查找及解决方案

    千次阅读 2020-04-24 11:38:58
    一、iostat和iowait详细解说-查看磁盘瓶颈 一、iostat基础  %iowait并不能反应磁盘瓶颈 1、安装iostat  iostat的包名叫sysstat yum install sysstat -y  2、iowait实际测量的是cpu时间: %iowait = (cpu idle ...
  • 某系统的备DB的CPU iowait[6%]高于其他机器无影响:一般iowait在20%才会影响到业务背景说明:此场地使用的是一主两备的架构。主库使用的是sas盘,两台备库使用的是sata盘此场地使用的是一主两备的架构。主库使用的是...
  • iowait过高处理

    2019-02-25 15:41:00
    网管告警: 告警主机:YiDHLWJKFZ-js-app-16 主机IP:192.168.***.*** 告警项目:system.cpu.util[,iowait] 告警时间:2019.02.25 15:18:48 ...告警信息:Disk I/O is overloaded on ...问题详情:CPU iowait ti...
  • cpu iowait高排查的case

    2021-04-02 16:55:17
    在之前的常见的Java问题排查方法一文中,没有写cpu iowait时的排查方法,主要的原因是自己之前也没碰到过什么cpu iowait高的case,很不幸的是在最近一周连续碰到了两起cpu iowait的case,通过这两起case让自己学习到...
  • CPU-IOWAIT分析

    千次阅读 2019-09-18 07:23:23
    某系统的备DB的CPU iowait[6%]高于其他机器无影响:一般iowait在20%才会影响到业务背景说明:此场地使用的是一主两备的架构。主库使用的是sas盘,两台备库使用的是sata盘此场地使用的是一主两备的架构。主库使用的是...
  • iostat和iowait详细解说 %iowait并不能反应磁盘瓶颈 iowait实际测量的是cpu时间:%iowait = (cpu idle time)/(all cpu time) 这个文章说明:高速cpu会造成很高的iowait值,但这并不代表磁盘是系统的瓶颈。唯一能...
  • iostat和iowait

    千次阅读 2012-08-07 16:31:51
    iostat和iowait[转] 十月 14th, 2011 发表在 linux系统 本文作者:深夜的蚊子 %iowait并不能反应磁盘瓶颈 iowait实际测量的是cpu时间:  %iowait = (cpu idle time)/(all cpu time) 这个...
  • 知识点--理解%iowait

    2019-03-07 20:02:16
    %iowait 表示在一个采样周期内有百分之几的时间属于以下情况:CPU空闲、并且有仍未完成的I/O请求。 对 %iowait 常见的误解有两个:  一是误以为 %iowait 表示CPU不能工作的时间,  二是误以为 %iowait 表示I/O有...
  • Cpu的Iowait time过高

    千次阅读 2018-06-06 14:28:46
    问题服务器的Iowait time达到60%二、排查流程1.通过top命令发现服务器的Iowait time非常高,严重影响服务器性能。[root@zhangwan22222222 ~]# top top - 15:07:40 up 2 days, 23:35, 10 users, load average: 5....
  • 理解 %IOWAIT (%WIO)

    2021-04-02 16:54:32
    %iowait 是 “sar -u” 等工具检查CPU使用率时显示的一个指标,在Linux上显示为 %iowait,在有的Unix版本上显示为 %wio,含义都是一样的。这个指标常常被误读,很多人把它当作I/O问题的征兆,我自己每隔一段时间就会...
  • 查看iowait

    2013-03-05 09:58:00
    最近要查看机器的iowait,然后归档上报。 然,也不至于用iostat或top去查看吧。 可以简单的通过 /proc/stat 里数据计算得出。 cpu 112216782 86129 367346873 4215542628 637164767 0 117780861 12842902 cpu0...
  • iowait过高导致CPUload高

    2020-11-04 11:00:19
    一台12核机器做k8s worker节点   top查看 us只有5% 但是wa有48%  ... 但是iostat 查看磁盘几乎没什么负载 %util基本是0没什么读写 ,不知道后面应该怎么检查为啥iowait会那么高</p>

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 10,673
精华内容 4,269
关键字:

iowait