## python-flops-源码 Python

• FLOPS(float point operations per second) and MIPS(million instructions per second) are units of measure for the numerical computing performance of a computer. Floating-point operations are typically ...
Computational performance:https://en.wikipedia.org/wiki/FLOPS FLOPS(float point operations per second) and MIPS(million instructions per second) are units of measure for the numerical computing performance of a computer. Floating-point operations are typically used in fields such as scientific computational research. The unit MIPS measures integer performance of a computer. Examples of integer operation include data movement (A to B) or value testing (If A = B, then C). MIPS as a performance benchmark is adequate when a computer is used in database queries, word processing, spreadsheets, or to run multiple virtual operating systems.[3][4] Frank H. McMahon, of the Lawrence Livermore National Laboratory, invented the terms FLOPS and MFLOPS (megaFLOPS) so that he could compare the supercomputers of the day by the number of floating-point calculations they performed per second. This was much better than using the prevalent MIPS to compare computers as this statistic usually had little bearing on the arithmetic capability of the machine. FLOPS on an HPC-system can be calculated using this equation:

This can be simplified to the most common case: a computer that has exactly 1 CPU:

FLOPS can be recorded in different measures of precision, for example, the TOP500 supercomputer list ranks computers by 64 bit (double-precision floating-point format) operations per second, abbreviated to FP64.[6] Similar measures are available for 32-bit (FP32(single precision floating point)) and 16-bit] (FP16(double precision floating point)) operations.

注意, 和FLOPs(浮点运算次数，用以衡量算法复杂度)的区别
## 深度学习中的FLOPs介绍及计算(注意区分FLOPS)

FLOPSFLOPs FLOPS：注意全大写，是floating point operations per second的缩写，意指每秒浮点运算次数，理解为计算速度。是一个衡量硬件性能的指标。 FLOPs：注意s小写，是floating point operations的缩写（s表...
FLOPS与FLOPs
FLOPS：注意全大写，是floating point operations per second的缩写，意指每秒浮点运算次数，理解为计算速度。是一个衡量硬件性能的指标。
FLOPs：注意s小写，是floating point operations的缩写（s表复数），意指浮点运算数，理解为计算量。可以用来衡量算法/模型的复杂度。
全连接网络中FLOPs的计算
推导
以4个输入神经元和3个输出神经元为例  计算一个输出神经元的的计算过程为

y

1

=

w

11

∗

x

1

+

w

21

∗

x

2

+

w

31

∗

x

3

+

w

41

∗

x

4

y1 = w_{11}*x_1+w_{21}*x_2+w_{31}*x_3+w_{41}*x_4

所需的计算次数为
4次乘法3次加法
共需4+3=7计算。推广到I个输入神经元O个输出神经元后则计算一个输出神经元所需要的计算次数为

I

+

(

I

−

1

)

=

2

I

−

1

I+(I-1)=2I-1

，则总的计算次数为

F

L

O

P

s

=

(

2

I

−

1

)

∗

O

FLOPs = (2I-1)*O

考虑bias则为

y

1

=

w

11

∗

x

1

+

w

21

∗

x

2

+

w

31

∗

x

3

+

w

41

∗

x

4

+

b

1

y1 = w_{11}*x_1+w_{21}*x_2+w_{31}*x_3+w_{41}*x_4+b1

总的计算次数为

F

L

O

P

s

=

2

I

∗

O

FLOPs = 2I*O

结果
FC（full connected）层FLOPs的计算公式如下(不考虑bias时有-1，有bias时没有-1):

F

L

O

P

s

=

(

2

×

I

−

1

)

×

O

FLOPs = (2 \times I - 1) \times O

其中:
I = input neuron numbers(输入神经元的数量)  O = output neuron numbers(输出神经元的数量)
CNN中FLOPs的计算

以下答案不考虑activation function的运算

推导

对于输入通道数为

C

i

n

C_{in}

,卷积核的大小为K,输出通道数为

C

o

u

t

C_{out}

,输出特征图的尺寸为

H

∗

W

H*W

进行一次卷积运算的计算次数为
乘法

C

i

n

K

2

C_{in}K^2

次加法

C

i

n

K

2

−

1

C_{in}K^2-1

次共计

C

i

n

K

2

+

C

i

n

K

2

−

1

=

2

C

i

n

K

2

−

1

C_{in}K^2+C_{in}K^2-1=2C_{in}K^2-1

次，若考虑bias则再加1次  得到一个channel的特征图所需的卷积次数为

H

∗

W

H*W

次  共计需得到

C

o

u

t

C_{out}

个特征图
因此对于CNN中的一个卷积层来说总的计算次数为(不考虑bias时有-1，考虑bias时没有-1):

F

L

O

P

s

=

(

2

C

i

n

K

2

−

1

)

H

W

C

o

u

t

FLOPs = (2C_{in}K^2-1)HWC_{out}

结果
卷积层FLOPs的计算公式如下(不考虑bias时有-1，有bias时没有-1):

F

L

O

P

s

=

(

2

C

i

n

K

2

−

1

)

H

W

C

o

u

t

FLOPs = (2C_{in}K^2-1)HWC_{out}

其中:

C

i

n

C_{in}

= input channelK= kernel sizeH,W = output feature map size

C

o

u

t

C_{out}

= output channel
计算FLOPs的代码或包
torchstat
from torchstat import stat
import torchvision.models as models

model = models.vgg16()
stat(model, (3, 224, 224))

        module name  input shape output shape       params memory(MB)              MAdd             Flops   MemRead(B)  MemWrite(B) duration[%]    MemR+W(B)
0        features.0    3 224 224   64 224 224       1792.0      12.25     173,408,256.0      89,915,392.0     609280.0   12845056.0       3.67%   13454336.0
1        features.1   64 224 224   64 224 224          0.0      12.25       3,211,264.0       3,211,264.0   12845056.0   12845056.0       1.83%   25690112.0
2        features.2   64 224 224   64 224 224      36928.0      12.25   3,699,376,128.0   1,852,899,328.0   12992768.0   12845056.0       8.43%   25837824.0
3        features.3   64 224 224   64 224 224          0.0      12.25       3,211,264.0       3,211,264.0   12845056.0   12845056.0       1.45%   25690112.0
4        features.4   64 224 224   64 112 112          0.0       3.06       2,408,448.0       3,211,264.0   12845056.0    3211264.0      11.37%   16056320.0
5        features.5   64 112 112  128 112 112      73856.0       6.12   1,849,688,064.0     926,449,664.0    3506688.0    6422528.0       4.03%    9929216.0
6        features.6  128 112 112  128 112 112          0.0       6.12       1,605,632.0       1,605,632.0    6422528.0    6422528.0       0.73%   12845056.0
7        features.7  128 112 112  128 112 112     147584.0       6.12   3,699,376,128.0   1,851,293,696.0    7012864.0    6422528.0       5.86%   13435392.0
8        features.8  128 112 112  128 112 112          0.0       6.12       1,605,632.0       1,605,632.0    6422528.0    6422528.0       0.37%   12845056.0
9        features.9  128 112 112  128  56  56          0.0       1.53       1,204,224.0       1,605,632.0    6422528.0    1605632.0       7.32%    8028160.0
10      features.10  128  56  56  256  56  56     295168.0       3.06   1,849,688,064.0     925,646,848.0    2786304.0    3211264.0       3.30%    5997568.0
11      features.11  256  56  56  256  56  56          0.0       3.06         802,816.0         802,816.0    3211264.0    3211264.0       0.00%    6422528.0
12      features.12  256  56  56  256  56  56     590080.0       3.06   3,699,376,128.0   1,850,490,880.0    5571584.0    3211264.0       5.13%    8782848.0
13      features.13  256  56  56  256  56  56          0.0       3.06         802,816.0         802,816.0    3211264.0    3211264.0       0.37%    6422528.0
14      features.14  256  56  56  256  56  56     590080.0       3.06   3,699,376,128.0   1,850,490,880.0    5571584.0    3211264.0       4.76%    8782848.0
15      features.15  256  56  56  256  56  56          0.0       3.06         802,816.0         802,816.0    3211264.0    3211264.0       0.37%    6422528.0
16      features.16  256  56  56  256  28  28          0.0       0.77         602,112.0         802,816.0    3211264.0     802816.0       2.56%    4014080.0
17      features.17  256  28  28  512  28  28    1180160.0       1.53   1,849,688,064.0     925,245,440.0    5523456.0    1605632.0       3.66%    7129088.0
18      features.18  512  28  28  512  28  28          0.0       1.53         401,408.0         401,408.0    1605632.0    1605632.0       0.00%    3211264.0
19      features.19  512  28  28  512  28  28    2359808.0       1.53   3,699,376,128.0   1,850,089,472.0   11044864.0    1605632.0       5.50%   12650496.0
20      features.20  512  28  28  512  28  28          0.0       1.53         401,408.0         401,408.0    1605632.0    1605632.0       0.00%    3211264.0
21      features.21  512  28  28  512  28  28    2359808.0       1.53   3,699,376,128.0   1,850,089,472.0   11044864.0    1605632.0       5.49%   12650496.0
22      features.22  512  28  28  512  28  28          0.0       1.53         401,408.0         401,408.0    1605632.0    1605632.0       0.00%    3211264.0
23      features.23  512  28  28  512  14  14          0.0       0.38         301,056.0         401,408.0    1605632.0     401408.0       1.10%    2007040.0
24      features.24  512  14  14  512  14  14    2359808.0       0.38     924,844,032.0     462,522,368.0    9840640.0     401408.0       2.94%   10242048.0
25      features.25  512  14  14  512  14  14          0.0       0.38         100,352.0         100,352.0     401408.0     401408.0       0.00%     802816.0
26      features.26  512  14  14  512  14  14    2359808.0       0.38     924,844,032.0     462,522,368.0    9840640.0     401408.0       2.57%   10242048.0
27      features.27  512  14  14  512  14  14          0.0       0.38         100,352.0         100,352.0     401408.0     401408.0       0.00%     802816.0
28      features.28  512  14  14  512  14  14    2359808.0       0.38     924,844,032.0     462,522,368.0    9840640.0     401408.0       2.19%   10242048.0
29      features.29  512  14  14  512  14  14          0.0       0.38         100,352.0         100,352.0     401408.0     401408.0       0.37%     802816.0
30      features.30  512  14  14  512   7   7          0.0       0.10          75,264.0         100,352.0     401408.0     100352.0       0.37%     501760.0
31          avgpool  512   7   7  512   7   7          0.0       0.10               0.0               0.0          0.0          0.0       0.00%          0.0
32     classifier.0        25088         4096  102764544.0       0.02     205,516,800.0     102,760,448.0  411158528.0      16384.0      10.62%  411174912.0
33     classifier.1         4096         4096          0.0       0.02           4,096.0           4,096.0      16384.0      16384.0       0.00%      32768.0
34     classifier.2         4096         4096          0.0       0.02               0.0               0.0          0.0          0.0       0.37%          0.0
35     classifier.3         4096         4096   16781312.0       0.02      33,550,336.0      16,777,216.0   67141632.0      16384.0       2.20%   67158016.0
36     classifier.4         4096         4096          0.0       0.02           4,096.0           4,096.0      16384.0      16384.0       0.00%      32768.0
37     classifier.5         4096         4096          0.0       0.02               0.0               0.0          0.0          0.0       0.37%          0.0
38     classifier.6         4096         1000    4097000.0       0.00       8,191,000.0       4,096,000.0   16404384.0       4000.0       0.73%   16408384.0
total                                          138357544.0     109.39  30,958,666,264.0  15,503,489,024.0   16404384.0       4000.0     100.00%  783170624.0
============================================================================================================================================================
Total params: 138,357,544
------------------------------------------------------------------------------------------------------------------------------------------------------------
Total memory: 109.39MB
Total Flops: 15.5GFlops
Total MemR+W: 746.89MB


参考资料
CNN 模型所需的计算力（flops）和参数（parameters）数量是怎么计算的？  分享一个FLOPs计算神器  CNN Explainer  Molchanov P , Tyree S , Karras T , et al. Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning[J]. 2016. 
## cnn中关于FLOPS的理解及计算

FLOPS：注意全大写，是floating point operations per second的缩写，指每秒浮点运算次数，理解为计算速度。是一个衡量硬件性能的指标。 FLOPs：注意s小写，是floating point operations的缩写（s表复数），指浮点运...
相关概念
FLOPS：注意全大写，是floating point operations per second的缩写，指每秒浮点运算次数，理解为计算速度。是一个衡量硬件性能的指标。
FLOPs：注意s小写，是floating point operations的缩写（s表复数），指浮点运算数，理解为计算量。可以用来衡量算法/模型的复杂度。
MACs：乘加运算（Multiplication and Accumulation），相当于2次浮点运算，硬件支持乘加指令可加快计算速度。
OP的计算
1. conv 计算
def compute_conv2d_flops(mod, input_shape = None, output_shape = None, macs = False):
_, cin, _, _ = input_shape
_, _, h, w, = output_shape

w_cout, w_cin, w_h, w_w =  mod.weight.data.shape

if mod.groups != 1:
input_channels = 1
else:
assert cin == w_cin
input_channels = w_cin

output_channels = w_cout
stride = mod.stride[0]
#     flops = h * w * output_channels * input_channels * w_h * w_w / (stride**2)
flops = h * w * output_channels * input_channels * w_h * w_w

if not macs:
flops_bias = output_shape[1:].numel() if mod.bias is not None else 0
flops = 2 * flops + flops_bias

return int(flops)

2. fc 计算
def compute_fc_flops(mod, input_shape = None, output_shape = None, macs = False):
ft_in, ft_out =  mod.weight.data.shape
flops = ft_in * ft_out

if not macs:
flops_bias = ft_out if mod.bias is not None else 0
flops = 2 * flops + flops_bias

return int(flops)

def compute_bn2d_flops(mod, input_shape = None, output_shape = None, macs = False):
# subtract, divide, gamma, beta
flops = 2 * input_shape[1:].numel()

if not macs:
flops *= 2

return int(flops)

3. relu 计算
def compute_relu_flops(mod, input_shape = None, output_shape = None, macs = False):

flops = 0
if not macs:
flops = input_shape[1:].numel()

return int(flops)

4. maxpool 计算
def compute_maxpool2d_flops(mod, input_shape = None, output_shape = None, macs = False):

flops = 0
if not macs:
flops = mod.kernel_size**2 * output_shape[1:].numel()

return flops

5. averagepool 计算
def compute_avgpool2d_flops(mod, input_shape = None, output_shape = None, macs = False):

flops = 0
if not macs:
flops = mod.kernel_size**2 * output_shape[1:].numel()

return flops

6. softmax 计算
def compute_softmax_flops(mod, input_shape = None, output_shape = None, macs = False):

nfeatures = input_shape[1:].numel()

total_exp = nfeatures # https://stackoverflow.com/questions/3979942/what-is-the-complexity-real-cost-of-exp-in-cmath-compared-to-a-flop
total_div = nfeatures

flops = total_div + total_exp

if not macs:

return flops

另一种计算方法–感觉更合理
'''
A simplified 3-D Tensor (channels, height, weight) for convolutional neural networks.
'''
class Tensor(object):
def __init__(self, c, h, w):
self.c = c
self.h = h
self.w = w

def equals(self, other):
return self.c == other.c and self.h == other.h and self.w == other.w

return (self.c % other.c == 0 or other.c % self.c == 0) and \
(self.h % other.h == 0 or other.h % self.h == 0) and \
(self.w % other.w == 0 or other.w % self.w == 0)

'''
Calculate the single-sample inference-time params and FLOPs of a convolutional
neural network with PyTorch-like APIs.
To calculate the params and FLOPs of certain network architecture, CNNCalculator
needs to be inherited and the network needs to be defined as in PyTorch.
For convenience, some basic operators are pre-defined and other modules can be
defined in a similar way. Parameters and FLOPs in Batch Normalization and other
types of layers are also computed. If only Convolutional and Linear layers are
Refer to MobileNet.py for details.
'''
class CNNCalculator(object):
def __init__(self, only_mac=False):
self.params = 0
self.flops = 0
self.only_mac = only_mac

def calculate(self, *inputs):
raise NotImplementedError

def Conv2d(self, tensor, out_c, size, stride=1, padding=0, groups=1, bias=True, name='conv'):
if type(size) == int:
size = (size, size)
if type(stride) == int:
stride = (stride, stride)
assert type(size) == tuple and len(size) == 2, 'illegal size parameters'
assert type(stride) == tuple and len(stride) == 2, 'illegal stride parameters'
size_h, size_w = size
stride_h, stride_w = stride

in_c = tensor.c
out_h = (tensor.h - size_h + 2 * padding_h) // stride_h + 1
out_w = (tensor.w - size_w + 2 * padding_w) // stride_w + 1
assert in_c % groups == 0 and out_c % groups == 0, 'in_c and out_c must be divisible by groups'

self.params += out_c * in_c // groups * size_h * size_w
self.flops += out_c * out_h * out_w * in_c // groups * size_h * size_w
if bias:
self.params += out_c
self.flops += out_c * out_w * out_h

return Tensor(out_c, out_h, out_w)

def BatchNorm2d(self, tensor, name='batch_norm'):
return tensor
# Batch normalization can be combined with the preceding convolution, so there are no FLOPs.
# out_c = tensor.c
# out_h = tensor.h
# out_w = tensor.w

# if self.only_mac:
# self.params += 4 * out_c
# self.flops += out_c * out_h * out_w
# return Tensor(out_c, out_h, out_w)

def ReLU(self, tensor, name='relu'):
out_c = tensor.c
out_h = tensor.h
out_w = tensor.w

if not self.only_mac:
self.flops += out_c * out_h * out_w
return Tensor(out_c, out_h, out_w)

def Sigmoid(self, tensor, name='relu'):
out_c = tensor.c
out_h = tensor.h
out_w = tensor.w

if not self.only_mac:
self.flops += out_c * out_h * out_w
return Tensor(out_c, out_h, out_w)

def Pool2d(self, tensor, size, stride=1, padding=0, name='pool'):
if type(size) == int:
size = (size, size)
if type(stride) == int:
stride = (stride, stride)
assert type(size) == tuple and len(size) == 2, 'illegal size parameters'
assert type(stride) == tuple and len(stride) == 2, 'illegal stride parameters'
size_h, size_w = size
stride_h, stride_w = stride

out_c = tensor.c
out_h = (tensor.h - size_h + 2 * padding_h) // stride_h + 1
out_w = (tensor.w - size_w + 2 * padding_w) // stride_w + 1
if not self.only_mac:
self.flops += out_c * out_h * out_w * size_h * size_w
return Tensor(out_c, out_h, out_w)

def AvgPool2d(self, tensor, size, stride=1, padding=0, name='avg_pool'):

def MaxPool2d(self, tensor, size, stride=1, padding=0, name='max_pool'):

def GlobalAvgPool2d(self, tensor, name='global_avg_pool'):
size = (tensor.h, tensor.w)
return self.AvgPool2d(tensor, size)

def GlobalMaxPool2d(self, tensor, name='global_max_pool'):
size = (tensor.h, tensor.w)
return self.MaxPool2d(tensor, size)

def Linear(self, tensor, out_c, name='fully_connected'):
in_c = tensor.c
out_h = tensor.h
out_w = tensor.w
assert out_h == 1 and out_w == 1, 'out_h or out_w is greater than 1 in Linear layer.'
self.params += in_c * out_c
self.flops += in_c * out_c
return Tensor(out_c, out_h, out_w)

def Concat(self, tensors, name='concat'):
out_c = 0
out_h = tensors[0].h
out_w = tensors[0].w
for tensor in tensors:
assert tensor.h == out_h and tensor.w == out_w, 'tensor dimensions mismatch in Concat layer.'
out_c += tensor.c
return Tensor(out_c, out_h, out_w)

out_c = tensor.c
out_h = tensor.h
out_w = tensor.w
if not self.only_mac:
self.flops += out_c * out_h * out_w
return Tensor(out_c, out_h, out_w)

def Multi(self, tensor, other, name='multi'):

def SplitBySize(self, tensor, sizes, name='split_by_size'):
assert sum(sizes) == tensor.c, 'sizes and tensor.c do not match.'
return [Tensor(c, tensor.h, tensor.w) for c in sizes]

转载请注明出处:https://blog.csdn.net/tbl1234567.作者:陶表犁
展开全文
tbl1234567 2020-10-23 13:48:22
## MACs和FLOPs

## FLOPS和FLOPs

## cal_param_flops.py

## 显存、参数、FLOPs、FLOPS

## FLOPS、FLOPs、FPS

## FLOPs、FLOPS、Params的含义及PyTorch中的计算方法

## FLOPs和使用fvcore计算FLOPs

## 计算模型的FLOPs

## 神经网络层的FLOPs计算

## FLOPS, FLOPs and MACs

