精华内容
下载资源
问答
  • 利用Python对视频进行关键帧提取
  • 视频处理,关键帧提取之帧差欧式距离法提取关键帧。读取avi视频文件帧,通过求欧式距离来得到关键帧。运行mian文件可以得到关键帧对应视频中的帧号。
  • 此代码使用函数 videoreader 通过计算直方图差异从视频中提取关键帧
  • 视频提取关键帧提取

    千次阅读 2020-11-09 16:52:36
    视频提取关键帧提取 文章目录视频提取关键帧提取前言一、什么是关键帧和为什么要提取关键帧?二、关键帧提取方法三、整理结果参考资料: 前言 正所谓做工作要做好记录,现在,我要开始记录啦。 一、什么是关键帧和为...

    视频提取关键帧提取

    前言

    正所谓做工作要做好记录,现在,我要开始记录啦。

    一、什么是关键帧和为什么要提取关键帧?

    1、每个视频都是一个图像序列,其内容比一张图像丰富很多,表现力强,信息量大。对视频的分析通常是基于视频帧,但视频帧通常存在大量冗余,对视频帧的提取也存在漏帧、冗余的现象。视频关键帧提取则主要体现视频中各个镜头的显著特征,通过视频关键帧提取能够有效减少视频检索所需要花费的时间,并能够增强视频检索的精确度。

    2、关键帧定义:把图像坐标系中每个“视频帧”都叠加在一起,这时镜头中视频帧的特征矢量会在空间中呈现出一个轨迹的状态,而与轨迹中特征值进行对应的“帧”即可称之为关键帧[1]

    3、视频具有层次化结构,由场景、镜头和帧三个逻辑单元组成。视频检索常基于帧进行,因此,提取视频的关键帧至关重要[2]

    二、关键帧提取方法

    1、关键帧提取思想:对视频序列采用镜头分割的方式,然后在镜头当中获得内容关键帧提取,接着利用“关键帧”来获得底层的形状、纹理和颜色等特征。

    2、关键帧提取方法:

    (1)全图像序列

    ​ 镜头边界方法是将镜头中的第一帧和最后一帧(或中间帧)作为关键帧。该方法简单易行,适于内容活动性小或内容保持不变的镜头,但未考虑镜头视觉内容的复杂性,限制了镜头关键帧的个数,提取的关键帧代表性不强,效果不够稳定。

    (2)压缩视频

    (3) 自定义k值聚类和内容分析的关键帧提取方法[2]

    (4)a 基于抽样的关键帧提取[3]

    ​ 基于抽样的方法是通过随机抽取或在规定的时间间隔内随机抽取视频帧。这种方法简单不实用。

    ​ b 基于颜色特征的关键帧提取

    ​ c 基于运动分析的关键帧提取

    ​ d 基于镜头边界的关键帧提取(*)

    	e 基于视频内容的关键帧提取(*)
    

    ​ f 基于聚类的关键帧提取 (可能是最合适的一个了),我要识别的类别已经确定。

    (5)使用3D-CNN提取关键帧[4]

    ​ 他提出了一种基于语义的视频关键帧提取算法,该算法首先使用层次聚类算法对视频关键帧进行初步提取;然后结合语义相关算法对初步提取的关键帧进行直方图对比去掉冗余帧,确定视频的关键帧;最后通过与其他算法比较,本文提出的算法提取的关键帧冗余度相对小。

    (6)首先利用卷积自编码器提取视频帧的深度特征,对其进行 K-means 聚类,在每类视频帧中采用清晰度筛选取出最清晰的视频帧作为初次提取的关键帧;然后利用点密度方法对初次提取的关键帧进行二次优化,得到最终提取的关键帧进行手语识别.[5]

    (7) 视频关键帧提取方法一般可分为四大类:

    ​ 第一类:基于图像内容的方法

    ​ 第二类:基于运动分析的方法

    ​ 第三类:基于轨迹曲线点密度特征的关键帧检测算法

    ​ 第四类:目前主流方法:基于聚类的方法

    (8)帧间差法

    ​ 源自: 以下代码出自zyb_as的github

    # -*- coding: utf-8 -*-
    """
    帧间最大值法
    Created on Tue Dec  4 16:48:57 2018
    keyframes extract tool
    this key frame extract algorithm is based on interframe difference.
    The principle is very simple
    First, we load the video and compute the interframe difference between each frames
    Then, we can choose one of these three methods to extract keyframes, which are 
    all based on the difference method:
        
    1. use the difference order
        The first few frames with the largest average interframe difference 
        are considered to be key frames.
    2. use the difference threshold
        The frames which the average interframe difference are large than the 
        threshold are considered to be key frames.
    3. use local maximum
        The frames which the average interframe difference are local maximum are 
        considered to be key frames.
        It should be noted that smoothing the average difference value before 
        calculating the local maximum can effectively remove noise to avoid 
        repeated extraction of frames of similar scenes.
    After a few experiment, the third method has a better key frame extraction effect.
    The original code comes from the link below, I optimized the code to reduce 
    unnecessary memory consumption.
    https://blog.csdn.net/qq_21997625/article/details/81285096
    @author: zyb_as
    """ 
    import cv2
    import operator
    import numpy as np
    import matplotlib.pyplot as plt
    import sys
    from scipy.signal import argrelextrema
    
     
    def smooth(x, window_len=13, window='hanning'):
        """smooth the data using a window with requested size.
        
        This method is based on the convolution of a scaled window with the signal.
        The signal is prepared by introducing reflected copies of the signal 
        (with the window size) in both ends so that transient parts are minimized
        in the begining and end part of the output signal.
        
        input:
            x: the input signal 
            window_len: the dimension of the smoothing window
            window: the type of window from 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'
                flat window will produce a moving average smoothing.
        output:
            the smoothed signal
            
        example:
        import numpy as np    
        t = np.linspace(-2,2,0.1)
        x = np.sin(t)+np.random.randn(len(t))*0.1
        y = smooth(x)
        
        see also: 
        
        numpy.hanning, numpy.hamming, numpy.bartlett, numpy.blackman, numpy.convolve
        scipy.signal.lfilter
     
        TODO: the window parameter could be the window itself if an array instead of a string   
        """
        print(len(x), window_len)
        # if x.ndim != 1:
        #     raise ValueError, "smooth only accepts 1 dimension arrays."
        #
        # if x.size < window_len:
        #     raise ValueError, "Input vector needs to be bigger than window size."
        #
        # if window_len < 3:
        #     return x
        #
        # if not window in ['flat', 'hanning', 'hamming', 'bartlett', 'blackman']:
        #     raise ValueError, "Window is on of 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'"
     
        s = np.r_[2 * x[0] - x[window_len:1:-1],
                  x, 2 * x[-1] - x[-1:-window_len:-1]]
        #print(len(s))
     
        if window == 'flat':  # moving average
            w = np.ones(window_len, 'd')
        else:
            w = getattr(np, window)(window_len)
        y = np.convolve(w / w.sum(), s, mode='same')
        return y[window_len - 1:-window_len + 1]
     
    
    class Frame:
        """class to hold information about each frame
        
        """
        def __init__(self, id, diff):
            self.id = id
            self.diff = diff
     
        def __lt__(self, other):
            if self.id == other.id:
                return self.id < other.id
            return self.id < other.id
     
        def __gt__(self, other):
            return other.__lt__(self)
     
        def __eq__(self, other):
            return self.id == other.id and self.id == other.id
     
        def __ne__(self, other):
            return not self.__eq__(other)
     
     
    def rel_change(a, b):
       x = (b - a) / max(a, b)
       print(x)
       return x
     
        
    if __name__ == "__main__":
        print(sys.executable)
        #Setting fixed threshold criteria
        USE_THRESH = False
        #fixed threshold value
        THRESH = 0.6
        #Setting fixed threshold criteria
        USE_TOP_ORDER = False
        #Setting local maxima criteria
        USE_LOCAL_MAXIMA = True
        #Number of top sorted frames
        NUM_TOP_FRAMES = 50
         
        #Video path of the source file
        videopath = 'pikachu.mp4'
        #Directory to store the processed frames
        dir = './extract_result/'
        #smoothing window size
        len_window = int(50)
        
        
        print("target video :" + videopath)
        print("frame save directory: " + dir)
        # load video and compute diff between frames
        cap = cv2.VideoCapture(str(videopath)) 
        curr_frame = None
        prev_frame = None 
        frame_diffs = []
        frames = []
        success, frame = cap.read()
        i = 0 
        while(success):
            luv = cv2.cvtColor(frame, cv2.COLOR_BGR2LUV)
            curr_frame = luv
            if curr_frame is not None and prev_frame is not None:
                #logic here
                diff = cv2.absdiff(curr_frame, prev_frame)
                diff_sum = np.sum(diff)
                diff_sum_mean = diff_sum / (diff.shape[0] * diff.shape[1])
                frame_diffs.append(diff_sum_mean)
                frame = Frame(i, diff_sum_mean)
                frames.append(frame)
            prev_frame = curr_frame
            i = i + 1
            success, frame = cap.read()   
        cap.release()
        
        # compute keyframe
        keyframe_id_set = set()
        if USE_TOP_ORDER:
            # sort the list in descending order
            frames.sort(key=operator.attrgetter("diff"), reverse=True)
            for keyframe in frames[:NUM_TOP_FRAMES]:
                keyframe_id_set.add(keyframe.id) 
        if USE_THRESH:
            print("Using Threshold")
            for i in range(1, len(frames)):
                if (rel_change(np.float(frames[i - 1].diff), np.float(frames[i].diff)) >= THRESH):
                    keyframe_id_set.add(frames[i].id)   
        if USE_LOCAL_MAXIMA:
            print("Using Local Maxima")
            diff_array = np.array(frame_diffs)
            sm_diff_array = smooth(diff_array, len_window)
            frame_indexes = np.asarray(argrelextrema(sm_diff_array, np.greater))[0]
            for i in frame_indexes:
                keyframe_id_set.add(frames[i - 1].id)
                
            plt.figure(figsize=(40, 20))
            plt.locator_params(numticks=100)
            plt.stem(sm_diff_array)
            plt.savefig(dir + 'plot.png')
        
        # save all keyframes as image
        cap = cv2.VideoCapture(str(videopath))
        curr_frame = None
        keyframes = []
        success, frame = cap.read()
        idx = 0
        while(success):
            if idx in keyframe_id_set:
                name = "keyframe_" + str(idx) + ".jpg"
                cv2.imwrite(dir + name, frame)
                keyframe_id_set.remove(idx)
            idx = idx + 1
            success, frame = cap.read()
        cap.release()
    

    运动分析流光法进行关键帧提取

    源自:AillenAnthony的github

    # Scripts to try and detect key frames that represent scene transitions
    # in a video. Has only been tried out on video of slides, so is likely not
    # robust for other types of video.
    
    # 1. 基于图像信息
    # 2. 基于运动分析(光流分析)
    
    import cv2
    import argparse
    import json
    import os
    import numpy as np
    import errno
    
    def getInfo(sourcePath):
        cap = cv2.VideoCapture(sourcePath)
        info = {
            "framecount": cap.get(cv2.CAP_PROP_FRAME_COUNT),
            "fps": cap.get(cv2.CAP_PROP_FPS),
            "width": int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),
            "heigth": int(cap.get(cv2.CAP_PROP_FRAME_Heigth)),
            "codec": int(cap.get(cv2.CAP_PROP_FOURCC))
        }
        cap.release()
        return info
    
    
    def scale(img, xScale, yScale):
        res = cv2.resize(img, None,fx=xScale, fy=yScale, interpolation = cv2.INTER_AREA)
        return res
    
    def resize(img, width, heigth):
        res = cv2.resize(img, (width, heigth), interpolation = cv2.INTER_AREA)
        return res
    
    #
    # Extract [numCols] domninant colors from an image
    # Uses KMeans on the pixels and then returns the centriods
    # of the colors
    #
    def extract_cols(image, numCols):
        # convert to np.float32 matrix that can be clustered
        Z = image.reshape((-1,3))
        Z = np.float32(Z)
    
        # Set parameters for the clustering
        max_iter = 20
        epsilon = 1.0
        K = numCols
        criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, max_iter, epsilon)
        labels = np.array([])
        # cluster
        compactness, labels, centers = cv2.kmeans(Z, K, labels, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
    
        clusterCounts = []
        for idx in range(K):
            count = len(Z[labels == idx])
            clusterCounts.append(count)
    
        #Reverse the cols stored in centers because cols are stored in BGR
        #in opencv.
        rgbCenters = []
        for center in centers:
            bgr = center.tolist()
            bgr.reverse()
            rgbCenters.append(bgr)
    
        cols = []
        for i in range(K):
            iCol = {
                "count": clusterCounts[i],
                "col": rgbCenters[i]
            }
            cols.append(iCol)
    
        return cols
    
    
    #
    # Calculates change data one one frame to the next one.
    #
    def calculateFrameStats(sourcePath, verbose=False, after_frame=0):  # 提取相邻帧的差别
        cap = cv2.VideoCapture(sourcePath)#提取视频
    
        data = {
            "frame_info": []
        }
    
        lastFrame = None
        while(cap.isOpened()):
            ret, frame = cap.read()
            if frame == None:
                break
    
            frame_number = cap.get(cv2.CAP_PROP_POS_FRAMES) - 1
    
            # Convert to grayscale, scale down and blur to make
            # calculate image differences more robust to noise
            gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)      # 提取灰度信息
            gray = scale(gray, 0.25, 0.25)      # 缩放为原来的四分之一
            gray = cv2.GaussianBlur(gray, (9,9), 0.0)   # 做高斯模糊
    
            if frame_number < after_frame:
                lastFrame = gray
                continue
    
    
            if lastFrame != None:
    
                diff = cv2.subtract(gray, lastFrame)        # 用当前帧减去上一帧
    
                diffMag = cv2.countNonZero(diff)        # 计算两帧灰度值不同的像素点个数
    
                frame_info = {
                    "frame_number": int(frame_number),
                    "diff_count": int(diffMag)
                }
                data["frame_info"].append(frame_info)
    
                if verbose:
                    cv2.imshow('diff', diff)
                    if cv2.waitKey(1) & 0xFF == ord('q'):
                        break
    
            # Keep a ref to this frame for differencing on the next iteration
            lastFrame = gray
    
        cap.release()
        cv2.destroyAllWindows()
    
        #compute some states
        diff_counts = [fi["diff_count"] for fi in data["frame_info"]]
        data["stats"] = {
            "num": len(diff_counts),
            "min": np.min(diff_counts),
            "max": np.max(diff_counts),
            "mean": np.mean(diff_counts),
            "median": np.median(diff_counts),
            "sd": np.std(diff_counts)   # 计算所有帧之间, 像素变化个数的标准差
        }
        greater_than_mean = [fi for fi in data["frame_info"] if fi["diff_count"] > data["stats"]["mean"]]
        greater_than_median = [fi for fi in data["frame_info"] if fi["diff_count"] > data["stats"]["median"]]
        greater_than_one_sd = [fi for fi in data["frame_info"] if fi["diff_count"] > data["stats"]["sd"] + data["stats"]["mean"]]
        greater_than_two_sd = [fi for fi in data["frame_info"] if fi["diff_count"] > (data["stats"]["sd"] * 2) + data["stats"]["mean"]]
        greater_than_three_sd = [fi for fi in data["frame_info"] if fi["diff_count"] > (data["stats"]["sd"] * 3) + data["stats"]["mean"]]
    
        # 统计其他信息
        data["stats"]["greater_than_mean"] = len(greater_than_mean)
        data["stats"]["greater_than_median"] = len(greater_than_median)
        data["stats"]["greater_than_one_sd"] = len(greater_than_one_sd)
        data["stats"]["greater_than_three_sd"] = len(greater_than_three_sd)
        data["stats"]["greater_than_two_sd"] = len(greater_than_two_sd)
    
        return data
    
    
    
    #
    # Take an image and write it out at various sizes.
    #
    # TODO: Create output directories if they do not exist.
    #
    def writeImagePyramid(destPath, name, seqNumber, image):
        fullPath = os.path.join(destPath, "full", name + "-" + str(seqNumber) + ".png")
        halfPath = os.path.join(destPath, "half", name + "-" + str(seqNumber) + ".png")
        quarterPath = os.path.join(destPath, "quarter", name + "-" + str(seqNumber) + ".png")
        eigthPath = os.path.join(destPath, "eigth", name + "-" + str(seqNumber) + ".png")
        sixteenthPath = os.path.join(destPath, "sixteenth", name + "-" + str(seqNumber) + ".png")
    
        hImage = scale(image, 0.5, 0.5)
        qImage = scale(image, 0.25, 0.25)
        eImage = scale(image, 0.125, 0.125)
        sImage = scale(image, 0.0625, 0.0625)
    
        cv2.imwrite(fullPath, image)
        cv2.imwrite(halfPath, hImage)
        cv2.imwrite(quarterPath, qImage)
        cv2.imwrite(eigthPath, eImage)
        cv2.imwrite(sixteenthPath, sImage)
    
    
    
    #
    # Selects a set of frames as key frames (frames that represent a significant difference in
    # the video i.e. potential scene chnges). Key frames are selected as those frames where the
    # number of pixels that changed from the previous frame are more than 1.85 standard deviations
    # times from the mean number of changed pixels across all interframe changes.
    #
    def detectScenes(sourcePath, destPath, data, name, verbose=False):
        destDir = os.path.join(destPath, "images")
    
        # TODO make sd multiplier externally configurable
        #diff_threshold = (data["stats"]["sd"] * 1.85) + data["stats"]["mean"]
        diff_threshold = (data["stats"]["sd"] * 2.05) + (data["stats"]["mean"])
    
        cap = cv2.VideoCapture(sourcePath)
        for index, fi in enumerate(data["frame_info"]):
            if fi["diff_count"] < diff_threshold:
                continue
    
            cap.set(cv2.CAP_PROP_POS_FRAMES, fi["frame_number"])
            ret, frame = cap.read()
    
            # extract dominant color
            small = resize(frame, 100, 100)
            cols = extract_cols(small, 5)
            data["frame_info"][index]["dominant_cols"] = cols
    
    
            if frame != None:
                writeImagePyramid(destDir, name, fi["frame_number"], frame)
    
                if verbose:
                    cv2.imshow('extract', frame)
                    if cv2.waitKey(1) & 0xFF == ord('q'):
                        break
    
        cap.release()
        cv2.destroyAllWindows()
        return data
    
    
    def makeOutputDirs(path):
        try:
            #todo this doesn't quite work like mkdirp. it will fail
            #fi any folder along the path exists. fix
            os.makedirs(os.path.join(path, "metadata"))
            os.makedirs(os.path.join(path, "images", "full"))
            os.makedirs(os.path.join(path, "images", "half"))
            os.makedirs(os.path.join(path, "images", "quarter"))
            os.makedirs(os.path.join(path, "images", "eigth"))
            os.makedirs(os.path.join(path, "images", "sixteenth"))
        except OSError as exc: # Python >2.5
            if exc.errno == errno.EEXIST and os.path.isdir(path):
                pass
            else: raise
    
    
    if __name__ == '__main__':
    
        parser = argparse.ArgumentParser()
    
        # parser.add_argument('-s','--source', help='source file', required=True)
        # parser.add_argument('-d', '--dest', help='dest folder', required=True)
        # parser.add_argument('-n', '--name', help='image sequence name', required=True)
        # parser.add_argument('-a','--after_frame', help='after frame', default=0)
        # parser.add_argument('-v', '--verbose', action='store_true')
        # parser.set_defaults(verbose=False)
    
        parser.add_argument('-s','--source', help='source file', default="dataset/video/wash_hand/00000.mp4")
        parser.add_argument('-d', '--dest', help='dest folder', default="dataset/video/key_frame")
        parser.add_argument('-n', '--name', help='image sequence name', default="")
        parser.add_argument('-a','--after_frame', help='after frame', default=0)
        parser.add_argument('-v', '--verbose', action='store_true')
        parser.set_defaults(verbose=False)
    
        args = parser.parse_args()
    
        if args.verbose:
            info = getInfo(args.source)
            print("Source Info: ", info)
    
        makeOutputDirs(args.dest)
    
        # Run the extraction
        data = calculateFrameStats(args.source, args.verbose, int(args.after_frame))
        data = detectScenes(args.source, args.dest, data, args.name, args.verbose)
        keyframeInfo = [frame_info for frame_info in data["frame_info"] if "dominant_cols" in frame_info]
    
        # Write out the results
        data_fp = os.path.join(args.dest, "metadata", args.name + "-meta.json")
        with open(data_fp, 'w') as f:
            data_json_str = json.dumps(data, indent=4)
            f.write(data_json_str)
    
        keyframe_info_fp = os.path.join(args.dest, "metadata", args.name + "-keyframe-meta.json")
        with open(keyframe_info_fp, 'w') as f:
            data_json_str = json.dumps(keyframeInfo, indent=4)
            f.write(data_json_str)
    

    (9) 使用ffmpeg进行关键帧提取

    代码见于此处

    (10) 使用K-means聚类法

    源代码见于此处

    filenames=dir('images/*.jpg');
    %file_name = fly-1;
    num=size(filenames,1);  %输出filenames中文件图片的个数
    key=zeros(1,num);  %一行,num列都是0  [0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
    cluster=zeros(1,num);   %[0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
    clusterCount=zeros(1,num);  %各聚类有的帧数   [0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
    count=0;        %聚类的个数    
     
    %threshold=0.75;  %阈值越大帧越多
    %airplane这个视频阈值设为0.93比较合适   0.95更好
    %********************************************************阈值**************************************************************%
    threshold=0.91;  %阈值
    centrodR=zeros(num,256);   %聚类质心R的直方图   第一帧图片256个初始化全部为0,第二帧也是,其余帧都是  %%%后面相似度大加入一帧后会对其进行调整
    centrodG=zeros(num,256);   %聚类质心G的直方图
    centrodB=zeros(num,256);   %聚类质心B的直方图
     
    if num==0
        error('Sorry, there is no pictures in images folder!');
    else
        %令首帧形成第一个聚类
        img=imread(strcat('images/',filenames(1).name));
        count=count+1;    %产生第一个聚类
        [preCountR,x]=imhist(img(:,:,1));   %red histogram    得到红色的直方图一共256个数值,每个数值有多少作为直方图的高度
        [preCountG,x]=imhist(img(:,:,2));   %green histogram
        [preCountB,x]=imhist(img(:,:,3));   %blue histogram
        
        cluster(1)=1;   %设定第一个聚类选取的关键帧初始为首帧  cluster变为了(1,0,0,0,0,......,0,0,0)cluster(1)是改变了第一个元素
        clusterCount(1)=clusterCount(1)+1;%clusterCount(1)为0,加1,变为1,最终 clusterCount(1)为[1,0,0,0,.....,0,0,0]
        centrodR(1,:)=preCountR; % centrodR本来是num(帧个数)行,256列,全部为0.。现在第一行为第一帧的红色直方图各个数值的高度
        centrodG(1,:)=preCountG;
        centrodB(1,:)=preCountB;
       
        visit = 1;
        for k=2:num
            img=imread(strcat('images/',filenames(k).name));  %循环读取每一帧,首先是第2帧
            [tmpCountR,x]=imhist(img(:,:,1));   %red histogram  得到红色分量直方图  第二幅图片的红色直方图
            [tmpCountG,x]=imhist(img(:,:,2));   %green histogram
            [tmpCountB,x]=imhist(img(:,:,3));   %blue histogram
     
            clusterGroupId=1;  %新定义的一个变量clusterGroupId为1
            maxSimilar=0;   %新定义,相似度
        
           
            for clusterCountI= visit:count          %目前 count为1   定义新变量clusterCountI  I来确定这一帧归属于第一个聚类还是第二个聚类
                sR=0;
                sG=0;
                sB=0;
                %运用颜色直方图法的差别函数
                for j=1:256
                    sR=min(centrodR(clusterCountI,j),tmpCountR(j))+sR;%,j从1到256,第一帧中R的所有值256个亮度  以及第二帧的红色直方图所有高度值  进行比较选最小的
                    sG=min(centrodG(clusterCountI,j),tmpCountG(j))+sG;
                    sB=min(centrodB(clusterCountI,j),tmpCountB(j))+sB;
                end
                dR=sR/sum(tmpCountR);
                dG=sG/sum(tmpCountG);
                dB=sB/sum(tmpCountB);
                %YUV,persons are sensitive to Y
                d=0.30*dR+0.59*dG+0.11*dB;  %运用颜色直方图法的差别函数  定义了d  差别函数
                if d>maxSimilar
                    clusterGroupId=clusterCountI;
                    maxSimilar=d;
                end
            end
            
            if maxSimilar>threshold
                %相似度大,与该聚类质心距离小
                %加入该聚类,并调整质心
                for ii=1:256    
                    centrodR(clusterGroupId,ii)=centrodR(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountR(ii)*1.0/(clusterCount(clusterGroupId)+1);
                    centrodG(clusterGroupId,ii)=centrodG(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountG(ii)*1.0/(clusterCount(clusterGroupId)+1);
                    centrodB(clusterGroupId,ii)=centrodB(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountB(ii)*1.0/(clusterCount(clusterGroupId)+1);
                end
                clusterCount(clusterGroupId)=clusterCount(clusterGroupId)+1;
                cluster(k)=clusterGroupId;   %第k帧在第clusterGroupId个聚类里面   cluster(3)等于1或者2,,也就是属于第一个聚类或者第二个聚类
            else
                %形成新的聚类,增加一个聚类质心
                count=count+1;
                 visit = visit+1;
                clusterCount(count)=clusterCount(count)+1;
                centrodR(count,:)=tmpCountR;
                centrodG(count,:)=tmpCountG;
                centrodB(count,:)=tmpCountB;
                cluster(k)=count;   %第k帧在第count个聚类里面   否则 cluster(k)就在新建的聚类中
            end
        end
        
        %至此,所有帧都划进相应的聚类,一共有count个聚类,第k帧在第cluster(k)聚类中
        %现欲取出每个聚类中离质心距离最近,即相似度最大的作为该聚类的关键帧
        maxSimilarity=zeros(1,count);
        frame=zeros(1,count);
        for i=1:num
            sR=0;
            sG=0;
            sB=0;
            %运用颜色直方图法的差别函数
            for j=1:256
                sR=min(centrodR(cluster(i),j),tmpCountR(j))+sR;%每一帧和聚类质心进行比较,取最小值 
                sG=min(centrodG(cluster(i),j),tmpCountG(j))+sG;
                sB=min(centrodB(cluster(i),j),tmpCountB(j))+sB;
            end
            dR=sR/sum(tmpCountR);
            dG=sG/sum(tmpCountG);
            dB=sB/sum(tmpCountB);
            %YUV,persons are sensitive to Y
            d=0.30*dR+0.59*dG+0.11*dB;
            if d>maxSimilarity(cluster(i))
                maxSimilarity(cluster(i))=d;
                frame(cluster(i))=i;
            end
        end
        
        for j=1:count
            key(frame(j))=1;
            figure(j);
            imshow(strcat('images/',filenames(frame(j)).name));
        end
    end
     
    keyFrameIndexes=find(key)
    

    这种方法在878帧图片中提取出198帧,冗余度还是比较高。

    (11) 使用CNN来保存图片在使用聚类法提取关键帧,这种方法由于TensorFlow环境搭配的有问题,没有实际运行,在这给出链接

    这个

    三、整理结果

    博主使用一分十五秒00000.MP4视频, 共1898帧,分为七类,帧差最大值帧间差法提取的效果很好。k-means聚类法效果不好,原因在于(1)代码是copy 别人的,没有进行优化,(2)博主对帧间聚类方法理论研究浅薄,指导不了实践。

    k-means聚类 提出484帧

    帧差最大值帧间差法提取出35帧

    参考资料:

    [1]苏筱涵.深度学习视角下视频关键帧提取与视频检索研究[J].网络安全技术与应用,2020(05):65-66.

    [2]王红霞,王磊,晏杉杉.视频检索中的关键帧提取方法研究[J].沈阳理工大学学报,2019,38(03):78-82.

    [3]王俊玲,卢新明.基于语义相关的视频关键帧提取算法[J/OL].计算机工程与应用:1-10[2020-11-04].http://kns.cnki.net/kcms/detail/11.2127.TP.20200319.1706.018.html.

    [4] 张晓宇,张云华.基于融合特征的视频关键帧提取方法.计算机系统应用,2019,28(11):176–181. http://www.c-s-a.org.cn/1003-3254/7163.html

    [5] [1]周舟,韩芳,王直杰.面向手语识别的视频关键帧提取和优化算法[J/OL].华东理工大学学报(自然科学版):1-8[2020-11-05].https://doi.org/10.14135/j.cnki.1006-3080.20191201002.

    附录:

    1、数字视音频处理知识点小结

    [2020-11-04].http://kns.cnki.net/kcms/detail/11.2127.TP.20200319.1706.018.html.

    [4] 张晓宇,张云华.基于融合特征的视频关键帧提取方法.计算机系统应用,2019,28(11):176–181. http://www.c-s-a.org.cn/1003-3254/7163.html

    [5] [1]周舟,韩芳,王直杰.面向手语识别的视频关键帧提取和优化算法[J/OL].华东理工大学学报(自然科学版):1-8[2020-11-05].https://doi.org/10.14135/j.cnki.1006-3080.20191201002.
    [6] https://me.csdn.net/cungudafa这位小姐姐的博客

    附录:

    1、数字视音频处理知识点小结

    展开全文
  • 基于聚类的方法提取关键帧,简单介绍了方法的思想,然后给出了Matlab代码,有自己写的注释。结果会显示出关键帧的图片,并显示关键帧序号。
  • 作为第一篇,这里介绍一下我对于关键帧提取算法效率的计算方法。 同时要考虑时间和正确率,两者占比为6:46:46:4 满分101010分 对于444的部分,一旦取到了对应场景的关键帧,该部分+1+1+1 正确性应该是对于每个场景是...
  • 视频提取关键镜头提取关键帧

    热门讨论 2013-12-27 11:09:38
    基于视频的播放内容,提取关键镜头,关键帧。进行视频剪切,分块,便于视频存储以及检索
  • 基于自适应阈值的聚类算法提取关键帧研究.pdf
  • FFMPEG提取关键帧

    热门讨论 2014-03-09 17:39:32
    通过FFMPEG提取视频关键帧 并保存为图片格式
  • 《结合主成分分析和聚类的关键帧提取》《结合主成分分析和聚类的关键帧提取》《结合主成分分析和聚类的关键帧提取》 作者 许文竹,徐立鸿 聚类的复杂度是并不低的,所以我们需要通过降低数据的维度来进行计算。 ...

    参考自
    《 结 合 主 成 分 分 析 和 聚 类 的 关 键 帧 提 取 》 《结合主成分分析和聚类的关键帧提取》
    作者
    许文竹,徐立鸿

    聚类的复杂度是并不低的,所以我们需要通过降低数据的维度来进行计算。

    主成分分析

    简单介绍

    P C A PCA PCA
    这里我们使用 P C A PCA PCA提取图像特征。

    P C A PCA PCA主要用于数据降维,对于高纬向量可以用 P C A PCA PCA求出其投影矩阵,将特征从高维降到低维,并且仍能保证反映了图像的特征。

    具体做法

    对于 w ∗ h w*h wh的图像,进行扁平化得到一组向量 ( 1 , w h ) (1,wh) (1,wh)
    但维度较高, X X X表示包含所有帧,则 X X X的维度是 ( n , w h ) (n,wh) (n,wh)

    我们求得总体的协方差矩阵 ∑ = 1 N ∑ i = 1 N ( X i − u ) ( X i − u ) T \sum=\frac{1}{N}\sum\limits_{i=1}^N(X_i-u)(X_i-u)^T =N1i=1N(Xiu)(Xiu)T
    u u u表示平均帧图像

    求得协方差矩阵了之后,可以求解 ∑ \sum 的特征值和特征向量, Q R QR QR或者 S V D SVD SVD都可以。
    设特征值为 λ i \lambda_i λi,则有: α ≤ ∑ i = 1 L λ i ∑ i = 1 w h λ i \alpha \leq \frac{\sum\limits_{i=1}^L\lambda_i}{\sum\limits_{i=1}^{wh} \lambda_i} αi=1whλii=1Lλi L L L等价于我们降到多少维。
    α \alpha α 0.90 0.90 0.90 0.99 0.99 0.99

    但是由于协方差矩阵过大,我们无法显示计算特征值,我们需要通过 S V D SVD SVD求得奇异值,用奇异值代替进行选择。最终得到的结果要和图像的个数取最小(具体原因并不懂,数学相关,否则会出错)

    聚类

    得到图像的特征之后,我们对图像进行聚类,我们使用 k m e a n s kmeans kmeans算法,但需要注意的是:
    1、此时的聚类,并不是漫无目的的去找聚类中心,而是每次只找自己附近的聚类中心。
    2、同时 k k k的选择,当平均帧差 ≤ 3500 \leq 3500 3500的时候,说明视频总体变换缓慢,但考虑到可能有局部剧烈运动的情况, k = m a x ( k 1 , k 2 ) k=max(k_1,k_2) k=max(k1,k2) k 1 k_1 k1是按比例求的关键帧个数, k 1 = n / 100 k_1 = n/100 k1=n/100 k 2 = f r a m e > T k_2=frame>T k2=frame>T,即帧差过大的个数,这里的 T T T设置成 13000 13000 13000
    3、当 > 3500 >3500 >3500的时候, k = k 1 = n / 50 k=k_1=n/50 k=k1=n/50
    4、防止迭代次数过多,我们设置个阈值 100 100 100

    最后即可求得结果,写完了但没测。
    代码如下:

    import numpy as np
    from sklearn.decomposition import PCA
    import cv2
    
    ansl = [1,94,132,154,162,177,222,236,252,268,286,310,322,255,373,401,
    423,431,444,498,546,594,627,681,759,800,832,846,932,1235,1369,1438,1529,1581,1847]
    ansr = [93,131,153,161,176,221,235,251,267,285,309,321,354,372,400,
    422,430,443,497,545,593,626,680,758,799,831,845,931,1234,1368,1437,
    1528,1580,1846,2139]#关键帧区间
    ansl = np.array(ansl)
    ansr = np.array(ansr)
    
    cap = cv2.VideoCapture('D:/ai/CV/pyt/1.mp4')
    Frame_rate = cap.get(5)#一秒多少帧
    Frame_number = int(cap.get(7))#帧数
    Frame_time = 1000 / Frame_rate;#一帧多少秒
    len_windows = 0
    local_windows = 0
     
    def smooth(swift_img,windows):
        r = swift_img.shape[1]
        c = swift_img.shape[2]
        for i in range(r):
            for j in range(c):
                L = swift_img[:,i,j]
                L = np.convolve(L,np.ones(windows),'same')
                swift_img[:,i,j] = L
        return swift_img
        
    def get_block(img):
        img = np.array(img)
        img = img.ravel()
        return img
        
    def get_img(now_time = 0,get_number = Frame_number):#便于算法学习
        swift_img = []#转换后
        index = 0#标记第几个帧
        time = now_time#当前时间
        while (cap.isOpened()):
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            if not ret:
                break
            img0 = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)#转换成灰度图
            img1 = get_block(img0)
            swift_img.append(img1)
            time += Frame_time
            index += 1
            if index >= get_number:
                break
            if index % 50 ==0:
                print("当前到达"+str(index))
        swift_img = np.array(swift_img)
        return swift_img
    
    def get_key_frame(Change):
        diff_C = Change[1:] - Change[:-1]
        mid_d = np.zeros(Change.shape[1])
        for i in range(diff_C.shape[1]):
            mid_d[i] = np.mean(diff_C[:,i])
        mid_d = np.sum(np.abs(mid_d))
        k = 0
        k1 = 0
        k2 = 0 
        T = 13000
        #确定聚类个数k
        #当mid_d<=3500的时候,说明视频内容变换平缓
        print(mid_d)
        if mid_d <= 3500:
            k1 = Frame_number / 100
            k2 = np.sum(diff_C >= T)
            k = max(k1,k2)
        else :
            k1 = Frame_number /  50
            k2 = np.sum(diff_C >= T)
            k = k1
        k = int(k)
        print(k)
        #确认了提取关键帧的数量,接下来进行聚类法提取关键帧
        Cluster = []
        set_cluster = []
        now = 0
        for i in range(k):
            if now >= Frame_number:
                now = Frame_number
            Cluster.append(Change[now-1])
            set_cluster.append({now})
            now += int(Frame_number / k)
        cnt = 0#防止迭代次数过多
        while True:
            cnt += 1
            now = 0#指代当前分配帧数
            for i in range(k):
                set_cluster[i].clear();#先清空每个集合
            for i in range(Frame_number):
                l = now 
                r = min(now + 1,k-1)
                ldiff = np.mean(abs(Cluster[l] - Change[i]))
                rdiff = np.mean(abs(Cluster[r] - Change[i]))
                if ldiff < rdiff:
                    set_cluster[l].add(i)
                else :
                    set_cluster[r].add(i)
                    now = r
            ok = True
            for i in range(k):
                Len = len(set_cluster[i])
                if Len == 0:
                    continue
                set_sum = np.zeros(Change.shape[1])
                for x in set_cluster[i]:
                    set_sum = set_sum + Change[x]
                set_sum /= Len
                if np.mean(abs(Cluster[i]-set_sum)) < 1e-10:
                    continue
                ok = False  
                Cluster[i] = set_sum
            print("第"+str(cnt)+"次聚类")
            if cnt >= 100 or ok == True:
                break
        TL = []
        for i in range(int(Frame_number)):
            TL.append(False)
        for i in range(k):
            MIN = 1e20
            for x in set_cluster[i]:
                MIN = min(MIN,np.mean(np.abs(Change[x] - Cluster[i])))
            for x in set_cluster[i]:
                if abs(MIN - np.mean(np.abs(Change[x] - Cluster[i]))) < 1e-10:
                    TL[x] = True
                    break
        TL = np.array(TL)
        return TL
    
    def preserve(L):
        num = 0
        time = 0
        for i in range(L.shape[0]):
            if L[i] == False:
                continue
            num += 1
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            cv2.imwrite('./1.1/{0:05d}.jpg'.format(num),img)#保存关键帧
            time += Frame_time
    
    def cal_ans(cal_L,l,r):
        rate = []
        add = 0
        right = 0
        for j in range(ansl.shape[0]):
            num = 0
            if not (l <= j and j <= r):
                continue
            ll = ansl[j]
            rr = ansr[j]
            for i in range(cal_L.shape[0]):
                if cal_L[i] == False:
                    continue
                if j == 0 :
                    print(i)
                if i + ansl[l] >= ll and i + ansl[l] <= rr:
                    num += 1
            if num == 0:
                rate.append(0.0)
            else:
                right += 1
                if num == 1:
                    rate.append(6.0)
                    continue
                add += num - 1
                rate.append(6.0)
        rate = np.array(rate)
        ret = np.sum(rate) / rate.shape[0]
        print("多余的个数:")
        print(add)
        add = add / (5 * (r - l + 1))
        add = min(add , 1)
        print("多余的占比:")
        print(add)
        print("正确的评分:")
        print(right)
        ret += 4 * (1 - add) * right / (r - l + 1)#总共帧数中只有正确的部分才考虑时间因素。
        print("评分是:")
        print(ret)
        return ret
    
    def study():
        window = 1
        local = 2
        mmax = 0
        lindex = 4
        rindex = 10
        for i in range(10):
            tmp = 1 + i
            for j in range(10):
                Tmp = 2 + j
                print("当前参数: "+"卷积窗口"+str(tmp)+"最值窗口"+str(Tmp))
                tmp_img = get_img(ansl[lindex],ansr[rindex])
                tmp_img = smooth(tmp_img,tmp)
                tmp_L = get_key_frame(tmp_img,Tmp)
                ttmp = cal_ans(tmp_L,lindex,rindex)
                if ttmp > mmax:
                    window = tmp
                    local = Tmp
                    mmax = ttmp
                print("分割线--------------------")
        return window,local
    
    def PCA_get_feature(X):
        #k is the components you want
        #mean of each feature
        mean_X = X.mean(axis = 0)
        X = X - mean_X
        #数据中心化
        k = 1
        U,S,V = np.linalg.svd(X,full_matrices = False)
        index = 0
        S_sum = np.sum(S)
        now = 0
        P = 0
        while True:
            now += S[index]
            index+=1
            if now >= 0.90 * S_sum:
                P = index / S.shape[0]
            if index == S.shape[0]:
                P = 1.0
                break
        k =  int(P * min(X.shape[1],X.shape[0]))
        #计算降维数量
        pca = PCA(n_components = k)
        pca.fit(X)#利用中心化的X建造模型
        new_x = pca.fit_transform(X)#得到降维后的数据
        return new_x
    
    swift_img = get_img()
    Frame_number = int(swift_img.shape[0])
    #Change = PCA_get_feature(swift_img)
    cal_L = get_key_frame(swift_img)
    print("结束")
    cal_ans(cal_L,0,ansl.shape[0]-1)
    
    

    初始评分, 4 4 4分。
    显然以分段设置聚类中心还是不够合理。
    我们按比例设置聚类中心,可以达到6.01分.
    按极值点设置聚类中心,可以达到7.91分

    import numpy as np
    from sklearn.decomposition import PCA
    import cv2
    
    ansl = [1,94,132,154,162,177,222,236,252,268,286,310,322,355,373,401,
    423,431,444,498,546,594,627,681,759,800,832,846,932,1235,1369,1438,1529,1581,1847]
    ansr = [93,131,153,161,176,221,235,251,267,285,309,321,354,372,400,
    422,430,443,497,545,593,626,680,758,799,831,845,931,1234,1368,1437,
    1528,1580,1846,2139]#关键帧区间
    ansl = np.array(ansl)
    ansr = np.array(ansr)
    
    cap = cv2.VideoCapture('D:/ai/CV/pyt/1.mp4')
    Frame_rate = cap.get(5)#一秒多少帧
    Frame_number = int(cap.get(7))#帧数
    Frame_time = 1000 / Frame_rate;#一帧多少秒
    len_windows = 0
    local_windows = 0
     
    def smooth(swift_img,windows):
        r = swift_img.shape[1]
        c = swift_img.shape[2]
        for i in range(r):
            for j in range(c):
                L = swift_img[:,i,j]
                L = np.convolve(L,np.ones(windows),'same')
                swift_img[:,i,j] = L
        return swift_img
        
    def get_block(img):
        img = np.array(img)
        img = img.ravel()
        return img
        
    def get_img(now_time = 0,get_number = Frame_number):#便于算法学习
        swift_img = []#转换后
        index = 0#标记第几个帧
        time = now_time#当前时间
        while (cap.isOpened()):
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            if not ret:
                break
            img0 = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)#转换成灰度图
            img1 = get_block(img0)
            swift_img.append(img1)
            time += Frame_time
            index += 1
            if index >= get_number:
                break
            if index % 50 ==0:
                print("当前到达"+str(index))
        swift_img = np.array(swift_img)
        return swift_img
    
    def get_key_frame(Change):
        diff_C = Change[1:] - Change[:-1]
        mid_d = np.zeros(Change.shape[1])
        for i in range(diff_C.shape[1]):
            mid_d[i] = np.mean(diff_C[:,i])
        mid_d = np.sum(np.abs(mid_d))
        k = 0
        k1 = 0
        k2 = 0 
        T = 13000
        #确定聚类个数k
        #当mid_d<=3500的时候,说明视频内容变换平缓
        print(mid_d)
        if mid_d <= 3500:
            k1 = Frame_number / 100
            k2 = np.sum(diff_C >= T)
            k = max(k1,k2)
        else :
            k1 = Frame_number /  50
            k2 = np.sum(diff_C >= T)
            k = k1
        k = int(k)
        print(k)
        #确认了提取关键帧的数量,接下来进行聚类法提取关键帧
        Cluster = []
        set_cluster = []
        now = 0
        for i in range(k):
            if now >= Frame_number - 2:
                now = Frame_number - 2 
            Cluster.append(Change[now])
            set_cluster.append({now})
            if(np.sum(np.abs(diff_C[now]))>mid_d):
                now += int(Frame_number / (3 * k))
            else:
                now += int(Frame_number / k)
        cnt = 0#防止迭代次数过多
        while True:
            cnt += 1
            now = 0#指代当前分配帧数
            for i in range(k):
                set_cluster[i].clear();#先清空每个集合
            for i in range(Frame_number):
                l = now 
                r = min(now + 1,k-1)
                ldiff = np.mean(abs(Cluster[l] - Change[i]))
                rdiff = np.mean(abs(Cluster[r] - Change[i]))
                if ldiff < rdiff:
                    set_cluster[l].add(i)
                else :
                    set_cluster[r].add(i)
                    now = r
            ok = True
            for i in range(k):
                Len = len(set_cluster[i])
                if Len == 0:
                    continue
                set_sum = np.zeros(Change.shape[1])
                for x in set_cluster[i]:
                    set_sum = set_sum + Change[x]
                set_sum /= Len
                if np.mean(abs(Cluster[i]-set_sum)) < 1e-10:
                    continue
                ok = False  
                Cluster[i] = set_sum
            print("第"+str(cnt)+"次聚类")
            if cnt >= 100 or ok == True:
                break
        TL = []
        for i in range(int(Frame_number)):
            TL.append(False)
        for i in range(k):
            MIN = 1e20
            for x in set_cluster[i]:
                MIN = min(MIN,np.mean(np.abs(Change[x] - Cluster[i])))
            for x in set_cluster[i]:
                if abs(MIN - np.mean(np.abs(Change[x] - Cluster[i]))) < 1e-10:
                    TL[x] = True
                    break
        TL = np.array(TL)
        return TL
    
    def preserve(L):
        num = 0
        time = 0
        for i in range(L.shape[0]):
            if L[i] == False:
                continue
            num += 1
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            cv2.imwrite('./1.1/{0:05d}.jpg'.format(num),img)#保存关键帧
            time += Frame_time
    
    def cal_ans(cal_L,l,r):
        rate = []
        add = 0
        right = 0
        for j in range(ansl.shape[0]):
            num = 0
            if not (l <= j and j <= r):
                continue
            ll = ansl[j]
            rr = ansr[j]
            for i in range(cal_L.shape[0]):
                if cal_L[i] == False:
                    continue
                if j == 0 :
                    print(i)
                if i + ansl[l] >= ll and i + ansl[l] <= rr:
                    num += 1
            if num == 0:
                rate.append(0.0)
            else:
                right += 1
                if num == 1:
                    rate.append(6.0)
                    continue
                add += num - 1
                rate.append(6.0)
        rate = np.array(rate)
        ret = np.sum(rate) / rate.shape[0]
        print("多余的个数:")
        print(add)
        add = add / (5 * (r - l + 1))
        add = min(add , 1)
        print("多余的占比:")
        print(add)
        print("正确的评分:")
        print(right)
        ret += 4 * (1 - add) * right / (r - l + 1)#总共帧数中只有正确的部分才考虑时间因素。
        print("评分是:")
        print(ret)
        return ret
    
    def study():
        window = 1
        local = 2
        mmax = 0
        lindex = 4
        rindex = 10
        for i in range(10):
            tmp = 1 + i
            for j in range(10):
                Tmp = 2 + j
                print("当前参数: "+"卷积窗口"+str(tmp)+"最值窗口"+str(Tmp))
                tmp_img = get_img(ansl[lindex],ansr[rindex])
                tmp_img = smooth(tmp_img,tmp)
                tmp_L = get_key_frame(tmp_img,Tmp)
                ttmp = cal_ans(tmp_L,lindex,rindex)
                if ttmp > mmax:
                    window = tmp
                    local = Tmp
                    mmax = ttmp
                print("分割线--------------------")
        return window,local
    
    def PCA_get_feature(X):
        #k is the components you want
        #mean of each feature
        mean_X = X.mean(axis = 0)
        X = X - mean_X
        #数据中心化
        k = 1
        U,S,V = np.linalg.svd(X,full_matrices = False)
        index = 0
        S_sum = np.sum(S)
        now = 0
        P = 0
        while True:
            now += S[index]
            index+=1
            if now >= 0.90 * S_sum:
                P = index / S.shape[0]
            if index == S.shape[0]:
                P = 1.0
                break
        k =  int(P * min(X.shape[1],X.shape[0]))
        #计算降维数量
        pca = PCA(n_components = k)
        pca.fit(X)#利用中心化的X建造模型
        new_x = pca.fit_transform(X)#得到降维后的数据
        return new_x
    
    swift_img = get_img()
    Frame_number = int(swift_img.shape[0])
    #Change = PCA_get_feature(swift_img)
    cal_L = get_key_frame(swift_img)
    print("结束")
    cal_ans(cal_L,0,ansl.shape[0]-1)
    
    
    展开全文
  • 在很多场景下,我们不想或者不能处理视频的每一帧图片,这时我们希望能够从视频中提取出一些重要的帧进行处理,这个过程我们称为视频关键帧提取关键帧提取算法多种多样,如何实现主要取决于你对于关键帧的定义。也...

    在很多场景下,我们不想或者不能处理视频的每一帧图片,这时我们希望能够从视频中提取出一些重要的帧进行处理,这个过程我们称为视频关键帧提取。

    关键帧提取算法多种多样,如何实现主要取决于你对于关键帧的定义。

    也就是说,对于你的实际应用场景,视频中什么样的图片才算是一个关键帧?

    今天我实现了一种比较通用的关键帧提取算法,它基于帧间差分。

    算法的原理很简单:我们知道,将两帧图像进行差分,得到图像的平均像素强度可以用来衡量两帧图像的变化大小。因此,基于帧间差分的平均强度,每当视频中的某一帧与前一帧画面内容产生了大的变化,我们便认为它是关键帧,并将其提取出来。

    算法的流程简述如下:

    首先,我们读取视频,并依次计算每两帧之间的帧间差分,进而得到平均帧间差分强度。

    然后,我们可以选择如下的三种方法的一种来提取关键帧,它们都是基于帧间差分的:

    1. 使用差分强度的顺序

    我们对所有帧按照平均帧间差分强度进行排序,选择平均帧间差分强度最高的若干张图片作为视频的关键帧。

    2. 使用差分强度阈值

    我们选择平均帧间差分强度高于预设阈值的帧作为视频的关键帧。

    3. 使用局部最大值

    我们选择具有平均帧间差分强度局部最大值的帧作为视频的关键帧。

    这种方法的提取结果在丰富度上表现更好一些,提取结果均匀分散在视频中。

    需要注意的是,使用这种方法时,对平均帧间差分强度时间序列进行平滑是很有效的技巧。它可以有效的移除噪声来避免将相似场景下的若干帧均同时提取为关键帧。

    这里比较推荐使用第三种方法来提取视频的关键帧。

    源码下载地址:

    https://github.com/monkeyDemon/AI-Toolbox/tree/master/preprocess%20ToolBox/keyframes_extract_tool

    最初的代码来自于这里:

    但是其代码本身有些问题,在读取超过100M的视频时程序会出现内存溢出的错误,因此我对其进行了优化,减去了不必要的内存消耗。

    在 精灵宝可梦的一个经典片段中 进行了实验,平滑后的平均帧间差分强度如下图所示:

    img01

    提取的部分关键帧如下所示:

    img02

    效果还不错吧~

    我这里仅仅是对视频关键帧提取的方法进行了简单的探索,最终得到的效果也满足了我实际工作的需要。如果您对视频关键帧提取领域很了解,或者了解其他更好的方法,期待与您交流~

    最后,对算法感兴趣的小伙伴,欢迎关注我的github项目AI-Toolbox。

    此项目旨在提高效率,快速迭代新想法,欢迎贡献代码~

    本文来自团队的安晟同学,他的github地址为:

    https://github.com/monkeyDemon

    经过8年多的发展,LSGO软件技术团队在地理信息系统、数据统计分析、计算机视觉领域积累了丰富的研发经验,也建立了人才培养的完备体系。

    欢迎对算法设计与实现感兴趣的同学加入,与我们共同成长进步。

    展开全文
  • 所有极值点都是关键帧。 最后得到的分数是 6.896.896.89 gaoimport numpy as np import cv2 ansl = [1,94,132,154,162,177,222,236,252,268,286,310,322,255,373,401, 423,431,444,498,546,594,627,681,759,800,832...

    参考自
    《 K e y F r a m e E x t r a c t i o n o f O n l i n e V i d e o B a s e d o n O p t i m i z e d F r a m e D i f f e r e n c e 》 《Key Frame Extraction of Online Video Based on Optimized Frame Difference》 KeyFrameExtractionofOnlineVideoBasedonOptimizedFrameDifference

    作者
    H u a y o n g L i u , W e n t i n g M e n g Huayong Liu, Wenting Meng HuayongLiu,WentingMeng等人

    帧差

    首先对帧间进行平滑,减少过小的差异。
    其次计算帧间差分均值。
    最后计算局部最值,因为量过大,所以需要设计一个带局部最值窗口的求极值方法。
    准确说就是,仅当你是 [ l , r ] [l,r] [l,r]区间的最值是你才是我们需要的极值。

    所有极值点都是关键帧。
    最后得到的分数是 6.89 6.89 6.89

    gaoimport numpy as np
    import cv2
    
    ansl = [1,94,132,154,162,177,222,236,252,268,286,310,322,255,373,401,
    423,431,444,498,546,594,627,681,759,800,832,846,932,1235,1369,1438,1529,1581,1847]
    ansr = [93,131,153,161,176,221,235,251,267,285,309,321,354,372,400,
    422,430,443,497,545,593,626,680,758,799,831,845,931,1234,1368,1437,
    1528,1580,1846,2139]#关键帧区间
    ansl = np.array(ansl)
    ansr = np.array(ansr)
    
    cap = cv2.VideoCapture('D:/ai/CV/pyt/1.mp4')
    Frame_rate = cap.get(5)#一秒多少帧
    Frame_number = cap.get(7)#帧数
    Frame_time = 1000 / Frame_rate;#一帧多少秒
    len_windows = 0
    local_windows = 0
     
    def smooth(swift_img,windows):
        r = swift_img.shape[1]
        c = swift_img.shape[2]
        for i in range(r):
            for j in range(c):
                L = swift_img[:,i,j]
                L = np.convolve(L,np.ones(windows),'same')
                swift_img[:,i,j] = L
        return swift_img
    
    def get_block(img):
        img = np.array(img)
        return img
    
    def dif(img1,img2):
        diff = img1 - img2
        diff = np.abs(np.array(diff))
        diff = diff.mean()
        return diff
        
    def get_img(now_time = 0,get_number = Frame_number):#便于算法学习
        swift_img = []#转换后
        index = 0#标记第几个帧
        time = now_time#当前时间
        while (cap.isOpened()):
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            if not ret:
                break
            img0 = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)#转换成灰度图
            img1 = get_block(img0)
            swift_img.append(img1)
            time += Frame_time
            index += 1
            if index >= get_number:
                break
            if index % 50 ==0:
                print("当前到达"+str(index))
        swift_img = np.array(swift_img)
        return swift_img
    
    def get_key_frame(swift_img,local_windows):
        L = []
        L.append(0)
        for i in range(swift_img.shape[0]-1):
            temp = dif(swift_img[i],swift_img[i+1])
            L.append(temp)
        L = np.array(L)
        TL = []
        for i in range(L.shape[0]):
            l = i - local_windows // 2
            r = i + local_windows // 2
            l = max(l,0)
            r = min(r,L.shape[0])
            if i == l + np.argmax(L[l:r]):
                TL.append(True)
            else:
                TL.append(False)
    #    print(TL)
        TL = np.array(TL)
        return TL
    
    def preserve(L):
        num = 0
        time = 0
        for i in range(L.shape[0]):
            if L[i] == False:
                continue
            num += 1
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            cv2.imwrite('./1.1/{0:05d}.jpg'.format(num),img)#保存关键帧
            time += Frame_time
    
    def cal_ans(cal_L,l,r):
        rate = []
        add = 0
        right = 0
        for j in range(ansl.shape[0]):
            num = 0
            if not (l <= j and j <= r):
                continue
            ll = ansl[j]
            rr = ansr[j]
            for i in range(cal_L.shape[0]):
                if cal_L[i] == False:
                    continue
                if j == 0 :
                    print(i)
                if i + ansl[l] >= ll and i + ansl[l] <= rr:
                    num += 1
            if num == 0:
                rate.append(0.0)
            else:
                right += 1
                if num == 1:
                    rate.append(6.0)
                    continue
                add += num - 1
                rate.append(6.0)
        rate = np.array(rate)
        ret = np.sum(rate) / rate.shape[0]
        print("多余的个数:")
        print(add)
        add = add / (5 * (r - l + 1))
        add = min(add , 1)
        print("多余的占比:")
        print(add)
        print("正确的评分:")
        print(right)
        ret += 4 * (1 - add) * right / (r - l + 1)#总共帧数中只有正确的部分才考虑时间因素。
        print("评分是:")
        print(ret)
        return ret
    
    def study():
        window = 1
        local = 2
        mmax = 0
        lindex = 4
        rindex = 10
        for i in range(10):
            tmp = 1 + i
            for j in range(10):
                Tmp = 2 + j
                print("当前参数: "+"卷积窗口"+str(tmp)+"最值窗口"+str(Tmp))
                tmp_img = get_img(ansl[lindex],ansr[rindex])
                tmp_img = smooth(tmp_img,tmp)
                tmp_L = get_key_frame(tmp_img,Tmp)
                ttmp = cal_ans(tmp_L,lindex,rindex)
                if ttmp > mmax:
                    window = tmp
                    local = Tmp
                    mmax = ttmp
                print("分割线--------------------")
        return window,local
    
    len_windows,local_windows = study()
    print("最终:")
    print(len_windows)
    print(local_windows)
    print("")
    
    swift_img = get_img()
    swift_img = smooth(swift_img,len_windows)
    cal_L= get_key_frame(swift_img,local_windows)
    cal_ans(cal_L,0,ansl.shape[0]-1)
    #preserve(cal_L)
    
    
    #结果:
    #6.89
    
    展开全文
  • CV | Python + OpenCV从视频中提取关键帧

    千次阅读 2019-05-13 16:43:07
    其实已经写过两篇关于从视频中提取关键帧的博客了。但是今天把代码又修改完善了一下,感觉比之前的几个版本都好(哈哈自夸中。。。),因此再次贴出利用OpenCV从视频中提取关键帧的Python代码。 程序 功能:自动的...
  • ffmpeg 提取关键帧

    万次阅读 2017-12-06 11:27:52
    参考key-frame 关键帧,是指动画中一个绘图,定义任何的起点和终点平滑过渡...获取方法,参考视频关键帧提取,ffmpeg -i video_name.mp4 -vf select='eq(pict_type\,I)' -vsync 2 -s 1920*1080 -f image2 core-%02d.jp
  • mpeg提取关键帧程序

    2014-03-05 11:31:05
    matlab关于mpeg视频的关键帧提取,很好用
  • 简单地说,I 帧是关键帧,属于帧内压缩。就是和AVI的压缩是一样的。 P是向前搜索的意思。B是双向搜索。他们都是基于 I 帧来压缩数据。[1] - I帧表示关键帧,可以理解为这一帧画面的完整保留;解码时只需要本帧数据就...
  • 聚类方法提取关键帧

    千次阅读 2019-02-23 10:27:24
    上一篇文章中说到关键帧出现的顺序可能不对,现在解决这个问题。 加入了visit 使得帧循环时不用一个聚类一个聚类去比较,只和前一个聚类去比较。使得输出的帧是按顺序的。 filenames=dir('D:\Documents\MATLAB...
  • ffmpeg提取关键帧

    2020-09-08 16:15:10
    保存记录,防止忘记 参考这位博主的
  • MP4提取关键帧

    千次阅读 2019-05-28 14:50:56
    def extract_all(videodir,save_dir): filenames = os.listdir(videodir) for file in filenames: if 'mp4' in file: savedir = os.path.join(save_dir,file.replace('.mp4','')) ...
  • c#实现从视频文件中提取关键帧数据源代码.rar
  • python opencv提取关键帧

    千次阅读 2018-05-08 10:03:31
    import cv2 cap = cv2.VideoCapture('/home/lw/3661....fps = cap.get(cv2.CAP_PROP_FPS) # 获取速 print fps fWidth = cap.get(cv2.CAP_PROP_FRAME_WIDTH) print fWidth fHeight = cap.get(cv2.CAP_PROP_FRAME...
  • live555+ffmpeg如何提取关键帧(I帧,P帧,B帧)
  • live555+ffmpeg如何提取关键帧(I帧,P帧,B帧) 开发流媒体播放器的时候,特别是在windows mobile,symbian(S60)平台开发时,很可能遇到需要自己开发播放器的情况。 S60平台提供了CVideoPlayUtility接口可以实现...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 24,167
精华内容 9,666
关键字:

提取关键帧