精华内容
下载资源
问答
  • 关键帧提取

    热门讨论 2012-02-06 11:21:58
    视频处理之 关键帧提取 从一个镜头中提取出关键帧(本人用了 3种方法:边界提取、(颜色)特征提取、聚类提取——K-Mean法)
  • 我的毕业设计,自己用matlab编的关键帧提取的代码,调试通过,运行结果较理想。与大家分享一下。参考了光流发提取关键帧的代码,通过计算帧差的欧式距离,均值,方差等来提取关键帧。
  • 视频提取关键帧提取

    千次阅读 2020-11-09 16:52:36
    视频提取关键帧提取 文章目录视频提取关键帧提取前言一、什么是关键帧和为什么要提取关键帧?二、关键帧提取方法三、整理结果参考资料: 前言 正所谓做工作要做好记录,现在,我要开始记录啦。 一、什么是关键帧和为...

    视频提取关键帧提取

    前言

    正所谓做工作要做好记录,现在,我要开始记录啦。

    一、什么是关键帧和为什么要提取关键帧?

    1、每个视频都是一个图像序列,其内容比一张图像丰富很多,表现力强,信息量大。对视频的分析通常是基于视频帧,但视频帧通常存在大量冗余,对视频帧的提取也存在漏帧、冗余的现象。视频关键帧提取则主要体现视频中各个镜头的显著特征,通过视频关键帧提取能够有效减少视频检索所需要花费的时间,并能够增强视频检索的精确度。

    2、关键帧定义:把图像坐标系中每个“视频帧”都叠加在一起,这时镜头中视频帧的特征矢量会在空间中呈现出一个轨迹的状态,而与轨迹中特征值进行对应的“帧”即可称之为关键帧[1]

    3、视频具有层次化结构,由场景、镜头和帧三个逻辑单元组成。视频检索常基于帧进行,因此,提取视频的关键帧至关重要[2]

    二、关键帧提取方法

    1、关键帧提取思想:对视频序列采用镜头分割的方式,然后在镜头当中获得内容关键帧提取,接着利用“关键帧”来获得底层的形状、纹理和颜色等特征。

    2、关键帧提取方法:

    (1)全图像序列

    ​ 镜头边界方法是将镜头中的第一帧和最后一帧(或中间帧)作为关键帧。该方法简单易行,适于内容活动性小或内容保持不变的镜头,但未考虑镜头视觉内容的复杂性,限制了镜头关键帧的个数,提取的关键帧代表性不强,效果不够稳定。

    (2)压缩视频

    (3) 自定义k值聚类和内容分析的关键帧提取方法[2]

    (4)a 基于抽样的关键帧提取[3]

    ​ 基于抽样的方法是通过随机抽取或在规定的时间间隔内随机抽取视频帧。这种方法简单不实用。

    ​ b 基于颜色特征的关键帧提取

    ​ c 基于运动分析的关键帧提取

    ​ d 基于镜头边界的关键帧提取(*)

    	e 基于视频内容的关键帧提取(*)
    

    ​ f 基于聚类的关键帧提取 (可能是最合适的一个了),我要识别的类别已经确定。

    (5)使用3D-CNN提取关键帧[4]

    ​ 他提出了一种基于语义的视频关键帧提取算法,该算法首先使用层次聚类算法对视频关键帧进行初步提取;然后结合语义相关算法对初步提取的关键帧进行直方图对比去掉冗余帧,确定视频的关键帧;最后通过与其他算法比较,本文提出的算法提取的关键帧冗余度相对小。

    (6)首先利用卷积自编码器提取视频帧的深度特征,对其进行 K-means 聚类,在每类视频帧中采用清晰度筛选取出最清晰的视频帧作为初次提取的关键帧;然后利用点密度方法对初次提取的关键帧进行二次优化,得到最终提取的关键帧进行手语识别.[5]

    (7) 视频关键帧提取方法一般可分为四大类:

    ​ 第一类:基于图像内容的方法

    ​ 第二类:基于运动分析的方法

    ​ 第三类:基于轨迹曲线点密度特征的关键帧检测算法

    ​ 第四类:目前主流方法:基于聚类的方法

    (8)帧间差法

    ​ 源自: 以下代码出自zyb_as的github

    # -*- coding: utf-8 -*-
    """
    帧间最大值法
    Created on Tue Dec  4 16:48:57 2018
    keyframes extract tool
    this key frame extract algorithm is based on interframe difference.
    The principle is very simple
    First, we load the video and compute the interframe difference between each frames
    Then, we can choose one of these three methods to extract keyframes, which are 
    all based on the difference method:
        
    1. use the difference order
        The first few frames with the largest average interframe difference 
        are considered to be key frames.
    2. use the difference threshold
        The frames which the average interframe difference are large than the 
        threshold are considered to be key frames.
    3. use local maximum
        The frames which the average interframe difference are local maximum are 
        considered to be key frames.
        It should be noted that smoothing the average difference value before 
        calculating the local maximum can effectively remove noise to avoid 
        repeated extraction of frames of similar scenes.
    After a few experiment, the third method has a better key frame extraction effect.
    The original code comes from the link below, I optimized the code to reduce 
    unnecessary memory consumption.
    https://blog.csdn.net/qq_21997625/article/details/81285096
    @author: zyb_as
    """ 
    import cv2
    import operator
    import numpy as np
    import matplotlib.pyplot as plt
    import sys
    from scipy.signal import argrelextrema
    
     
    def smooth(x, window_len=13, window='hanning'):
        """smooth the data using a window with requested size.
        
        This method is based on the convolution of a scaled window with the signal.
        The signal is prepared by introducing reflected copies of the signal 
        (with the window size) in both ends so that transient parts are minimized
        in the begining and end part of the output signal.
        
        input:
            x: the input signal 
            window_len: the dimension of the smoothing window
            window: the type of window from 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'
                flat window will produce a moving average smoothing.
        output:
            the smoothed signal
            
        example:
        import numpy as np    
        t = np.linspace(-2,2,0.1)
        x = np.sin(t)+np.random.randn(len(t))*0.1
        y = smooth(x)
        
        see also: 
        
        numpy.hanning, numpy.hamming, numpy.bartlett, numpy.blackman, numpy.convolve
        scipy.signal.lfilter
     
        TODO: the window parameter could be the window itself if an array instead of a string   
        """
        print(len(x), window_len)
        # if x.ndim != 1:
        #     raise ValueError, "smooth only accepts 1 dimension arrays."
        #
        # if x.size < window_len:
        #     raise ValueError, "Input vector needs to be bigger than window size."
        #
        # if window_len < 3:
        #     return x
        #
        # if not window in ['flat', 'hanning', 'hamming', 'bartlett', 'blackman']:
        #     raise ValueError, "Window is on of 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'"
     
        s = np.r_[2 * x[0] - x[window_len:1:-1],
                  x, 2 * x[-1] - x[-1:-window_len:-1]]
        #print(len(s))
     
        if window == 'flat':  # moving average
            w = np.ones(window_len, 'd')
        else:
            w = getattr(np, window)(window_len)
        y = np.convolve(w / w.sum(), s, mode='same')
        return y[window_len - 1:-window_len + 1]
     
    
    class Frame:
        """class to hold information about each frame
        
        """
        def __init__(self, id, diff):
            self.id = id
            self.diff = diff
     
        def __lt__(self, other):
            if self.id == other.id:
                return self.id < other.id
            return self.id < other.id
     
        def __gt__(self, other):
            return other.__lt__(self)
     
        def __eq__(self, other):
            return self.id == other.id and self.id == other.id
     
        def __ne__(self, other):
            return not self.__eq__(other)
     
     
    def rel_change(a, b):
       x = (b - a) / max(a, b)
       print(x)
       return x
     
        
    if __name__ == "__main__":
        print(sys.executable)
        #Setting fixed threshold criteria
        USE_THRESH = False
        #fixed threshold value
        THRESH = 0.6
        #Setting fixed threshold criteria
        USE_TOP_ORDER = False
        #Setting local maxima criteria
        USE_LOCAL_MAXIMA = True
        #Number of top sorted frames
        NUM_TOP_FRAMES = 50
         
        #Video path of the source file
        videopath = 'pikachu.mp4'
        #Directory to store the processed frames
        dir = './extract_result/'
        #smoothing window size
        len_window = int(50)
        
        
        print("target video :" + videopath)
        print("frame save directory: " + dir)
        # load video and compute diff between frames
        cap = cv2.VideoCapture(str(videopath)) 
        curr_frame = None
        prev_frame = None 
        frame_diffs = []
        frames = []
        success, frame = cap.read()
        i = 0 
        while(success):
            luv = cv2.cvtColor(frame, cv2.COLOR_BGR2LUV)
            curr_frame = luv
            if curr_frame is not None and prev_frame is not None:
                #logic here
                diff = cv2.absdiff(curr_frame, prev_frame)
                diff_sum = np.sum(diff)
                diff_sum_mean = diff_sum / (diff.shape[0] * diff.shape[1])
                frame_diffs.append(diff_sum_mean)
                frame = Frame(i, diff_sum_mean)
                frames.append(frame)
            prev_frame = curr_frame
            i = i + 1
            success, frame = cap.read()   
        cap.release()
        
        # compute keyframe
        keyframe_id_set = set()
        if USE_TOP_ORDER:
            # sort the list in descending order
            frames.sort(key=operator.attrgetter("diff"), reverse=True)
            for keyframe in frames[:NUM_TOP_FRAMES]:
                keyframe_id_set.add(keyframe.id) 
        if USE_THRESH:
            print("Using Threshold")
            for i in range(1, len(frames)):
                if (rel_change(np.float(frames[i - 1].diff), np.float(frames[i].diff)) >= THRESH):
                    keyframe_id_set.add(frames[i].id)   
        if USE_LOCAL_MAXIMA:
            print("Using Local Maxima")
            diff_array = np.array(frame_diffs)
            sm_diff_array = smooth(diff_array, len_window)
            frame_indexes = np.asarray(argrelextrema(sm_diff_array, np.greater))[0]
            for i in frame_indexes:
                keyframe_id_set.add(frames[i - 1].id)
                
            plt.figure(figsize=(40, 20))
            plt.locator_params(numticks=100)
            plt.stem(sm_diff_array)
            plt.savefig(dir + 'plot.png')
        
        # save all keyframes as image
        cap = cv2.VideoCapture(str(videopath))
        curr_frame = None
        keyframes = []
        success, frame = cap.read()
        idx = 0
        while(success):
            if idx in keyframe_id_set:
                name = "keyframe_" + str(idx) + ".jpg"
                cv2.imwrite(dir + name, frame)
                keyframe_id_set.remove(idx)
            idx = idx + 1
            success, frame = cap.read()
        cap.release()
    

    运动分析流光法进行关键帧提取

    源自:AillenAnthony的github

    # Scripts to try and detect key frames that represent scene transitions
    # in a video. Has only been tried out on video of slides, so is likely not
    # robust for other types of video.
    
    # 1. 基于图像信息
    # 2. 基于运动分析(光流分析)
    
    import cv2
    import argparse
    import json
    import os
    import numpy as np
    import errno
    
    def getInfo(sourcePath):
        cap = cv2.VideoCapture(sourcePath)
        info = {
            "framecount": cap.get(cv2.CAP_PROP_FRAME_COUNT),
            "fps": cap.get(cv2.CAP_PROP_FPS),
            "width": int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),
            "heigth": int(cap.get(cv2.CAP_PROP_FRAME_Heigth)),
            "codec": int(cap.get(cv2.CAP_PROP_FOURCC))
        }
        cap.release()
        return info
    
    
    def scale(img, xScale, yScale):
        res = cv2.resize(img, None,fx=xScale, fy=yScale, interpolation = cv2.INTER_AREA)
        return res
    
    def resize(img, width, heigth):
        res = cv2.resize(img, (width, heigth), interpolation = cv2.INTER_AREA)
        return res
    
    #
    # Extract [numCols] domninant colors from an image
    # Uses KMeans on the pixels and then returns the centriods
    # of the colors
    #
    def extract_cols(image, numCols):
        # convert to np.float32 matrix that can be clustered
        Z = image.reshape((-1,3))
        Z = np.float32(Z)
    
        # Set parameters for the clustering
        max_iter = 20
        epsilon = 1.0
        K = numCols
        criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, max_iter, epsilon)
        labels = np.array([])
        # cluster
        compactness, labels, centers = cv2.kmeans(Z, K, labels, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
    
        clusterCounts = []
        for idx in range(K):
            count = len(Z[labels == idx])
            clusterCounts.append(count)
    
        #Reverse the cols stored in centers because cols are stored in BGR
        #in opencv.
        rgbCenters = []
        for center in centers:
            bgr = center.tolist()
            bgr.reverse()
            rgbCenters.append(bgr)
    
        cols = []
        for i in range(K):
            iCol = {
                "count": clusterCounts[i],
                "col": rgbCenters[i]
            }
            cols.append(iCol)
    
        return cols
    
    
    #
    # Calculates change data one one frame to the next one.
    #
    def calculateFrameStats(sourcePath, verbose=False, after_frame=0):  # 提取相邻帧的差别
        cap = cv2.VideoCapture(sourcePath)#提取视频
    
        data = {
            "frame_info": []
        }
    
        lastFrame = None
        while(cap.isOpened()):
            ret, frame = cap.read()
            if frame == None:
                break
    
            frame_number = cap.get(cv2.CAP_PROP_POS_FRAMES) - 1
    
            # Convert to grayscale, scale down and blur to make
            # calculate image differences more robust to noise
            gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)      # 提取灰度信息
            gray = scale(gray, 0.25, 0.25)      # 缩放为原来的四分之一
            gray = cv2.GaussianBlur(gray, (9,9), 0.0)   # 做高斯模糊
    
            if frame_number < after_frame:
                lastFrame = gray
                continue
    
    
            if lastFrame != None:
    
                diff = cv2.subtract(gray, lastFrame)        # 用当前帧减去上一帧
    
                diffMag = cv2.countNonZero(diff)        # 计算两帧灰度值不同的像素点个数
    
                frame_info = {
                    "frame_number": int(frame_number),
                    "diff_count": int(diffMag)
                }
                data["frame_info"].append(frame_info)
    
                if verbose:
                    cv2.imshow('diff', diff)
                    if cv2.waitKey(1) & 0xFF == ord('q'):
                        break
    
            # Keep a ref to this frame for differencing on the next iteration
            lastFrame = gray
    
        cap.release()
        cv2.destroyAllWindows()
    
        #compute some states
        diff_counts = [fi["diff_count"] for fi in data["frame_info"]]
        data["stats"] = {
            "num": len(diff_counts),
            "min": np.min(diff_counts),
            "max": np.max(diff_counts),
            "mean": np.mean(diff_counts),
            "median": np.median(diff_counts),
            "sd": np.std(diff_counts)   # 计算所有帧之间, 像素变化个数的标准差
        }
        greater_than_mean = [fi for fi in data["frame_info"] if fi["diff_count"] > data["stats"]["mean"]]
        greater_than_median = [fi for fi in data["frame_info"] if fi["diff_count"] > data["stats"]["median"]]
        greater_than_one_sd = [fi for fi in data["frame_info"] if fi["diff_count"] > data["stats"]["sd"] + data["stats"]["mean"]]
        greater_than_two_sd = [fi for fi in data["frame_info"] if fi["diff_count"] > (data["stats"]["sd"] * 2) + data["stats"]["mean"]]
        greater_than_three_sd = [fi for fi in data["frame_info"] if fi["diff_count"] > (data["stats"]["sd"] * 3) + data["stats"]["mean"]]
    
        # 统计其他信息
        data["stats"]["greater_than_mean"] = len(greater_than_mean)
        data["stats"]["greater_than_median"] = len(greater_than_median)
        data["stats"]["greater_than_one_sd"] = len(greater_than_one_sd)
        data["stats"]["greater_than_three_sd"] = len(greater_than_three_sd)
        data["stats"]["greater_than_two_sd"] = len(greater_than_two_sd)
    
        return data
    
    
    
    #
    # Take an image and write it out at various sizes.
    #
    # TODO: Create output directories if they do not exist.
    #
    def writeImagePyramid(destPath, name, seqNumber, image):
        fullPath = os.path.join(destPath, "full", name + "-" + str(seqNumber) + ".png")
        halfPath = os.path.join(destPath, "half", name + "-" + str(seqNumber) + ".png")
        quarterPath = os.path.join(destPath, "quarter", name + "-" + str(seqNumber) + ".png")
        eigthPath = os.path.join(destPath, "eigth", name + "-" + str(seqNumber) + ".png")
        sixteenthPath = os.path.join(destPath, "sixteenth", name + "-" + str(seqNumber) + ".png")
    
        hImage = scale(image, 0.5, 0.5)
        qImage = scale(image, 0.25, 0.25)
        eImage = scale(image, 0.125, 0.125)
        sImage = scale(image, 0.0625, 0.0625)
    
        cv2.imwrite(fullPath, image)
        cv2.imwrite(halfPath, hImage)
        cv2.imwrite(quarterPath, qImage)
        cv2.imwrite(eigthPath, eImage)
        cv2.imwrite(sixteenthPath, sImage)
    
    
    
    #
    # Selects a set of frames as key frames (frames that represent a significant difference in
    # the video i.e. potential scene chnges). Key frames are selected as those frames where the
    # number of pixels that changed from the previous frame are more than 1.85 standard deviations
    # times from the mean number of changed pixels across all interframe changes.
    #
    def detectScenes(sourcePath, destPath, data, name, verbose=False):
        destDir = os.path.join(destPath, "images")
    
        # TODO make sd multiplier externally configurable
        #diff_threshold = (data["stats"]["sd"] * 1.85) + data["stats"]["mean"]
        diff_threshold = (data["stats"]["sd"] * 2.05) + (data["stats"]["mean"])
    
        cap = cv2.VideoCapture(sourcePath)
        for index, fi in enumerate(data["frame_info"]):
            if fi["diff_count"] < diff_threshold:
                continue
    
            cap.set(cv2.CAP_PROP_POS_FRAMES, fi["frame_number"])
            ret, frame = cap.read()
    
            # extract dominant color
            small = resize(frame, 100, 100)
            cols = extract_cols(small, 5)
            data["frame_info"][index]["dominant_cols"] = cols
    
    
            if frame != None:
                writeImagePyramid(destDir, name, fi["frame_number"], frame)
    
                if verbose:
                    cv2.imshow('extract', frame)
                    if cv2.waitKey(1) & 0xFF == ord('q'):
                        break
    
        cap.release()
        cv2.destroyAllWindows()
        return data
    
    
    def makeOutputDirs(path):
        try:
            #todo this doesn't quite work like mkdirp. it will fail
            #fi any folder along the path exists. fix
            os.makedirs(os.path.join(path, "metadata"))
            os.makedirs(os.path.join(path, "images", "full"))
            os.makedirs(os.path.join(path, "images", "half"))
            os.makedirs(os.path.join(path, "images", "quarter"))
            os.makedirs(os.path.join(path, "images", "eigth"))
            os.makedirs(os.path.join(path, "images", "sixteenth"))
        except OSError as exc: # Python >2.5
            if exc.errno == errno.EEXIST and os.path.isdir(path):
                pass
            else: raise
    
    
    if __name__ == '__main__':
    
        parser = argparse.ArgumentParser()
    
        # parser.add_argument('-s','--source', help='source file', required=True)
        # parser.add_argument('-d', '--dest', help='dest folder', required=True)
        # parser.add_argument('-n', '--name', help='image sequence name', required=True)
        # parser.add_argument('-a','--after_frame', help='after frame', default=0)
        # parser.add_argument('-v', '--verbose', action='store_true')
        # parser.set_defaults(verbose=False)
    
        parser.add_argument('-s','--source', help='source file', default="dataset/video/wash_hand/00000.mp4")
        parser.add_argument('-d', '--dest', help='dest folder', default="dataset/video/key_frame")
        parser.add_argument('-n', '--name', help='image sequence name', default="")
        parser.add_argument('-a','--after_frame', help='after frame', default=0)
        parser.add_argument('-v', '--verbose', action='store_true')
        parser.set_defaults(verbose=False)
    
        args = parser.parse_args()
    
        if args.verbose:
            info = getInfo(args.source)
            print("Source Info: ", info)
    
        makeOutputDirs(args.dest)
    
        # Run the extraction
        data = calculateFrameStats(args.source, args.verbose, int(args.after_frame))
        data = detectScenes(args.source, args.dest, data, args.name, args.verbose)
        keyframeInfo = [frame_info for frame_info in data["frame_info"] if "dominant_cols" in frame_info]
    
        # Write out the results
        data_fp = os.path.join(args.dest, "metadata", args.name + "-meta.json")
        with open(data_fp, 'w') as f:
            data_json_str = json.dumps(data, indent=4)
            f.write(data_json_str)
    
        keyframe_info_fp = os.path.join(args.dest, "metadata", args.name + "-keyframe-meta.json")
        with open(keyframe_info_fp, 'w') as f:
            data_json_str = json.dumps(keyframeInfo, indent=4)
            f.write(data_json_str)
    

    (9) 使用ffmpeg进行关键帧提取

    代码见于此处

    (10) 使用K-means聚类法

    源代码见于此处

    filenames=dir('images/*.jpg');
    %file_name = fly-1;
    num=size(filenames,1);  %输出filenames中文件图片的个数
    key=zeros(1,num);  %一行,num列都是0  [0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
    cluster=zeros(1,num);   %[0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
    clusterCount=zeros(1,num);  %各聚类有的帧数   [0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
    count=0;        %聚类的个数    
     
    %threshold=0.75;  %阈值越大帧越多
    %airplane这个视频阈值设为0.93比较合适   0.95更好
    %********************************************************阈值**************************************************************%
    threshold=0.91;  %阈值
    centrodR=zeros(num,256);   %聚类质心R的直方图   第一帧图片256个初始化全部为0,第二帧也是,其余帧都是  %%%后面相似度大加入一帧后会对其进行调整
    centrodG=zeros(num,256);   %聚类质心G的直方图
    centrodB=zeros(num,256);   %聚类质心B的直方图
     
    if num==0
        error('Sorry, there is no pictures in images folder!');
    else
        %令首帧形成第一个聚类
        img=imread(strcat('images/',filenames(1).name));
        count=count+1;    %产生第一个聚类
        [preCountR,x]=imhist(img(:,:,1));   %red histogram    得到红色的直方图一共256个数值,每个数值有多少作为直方图的高度
        [preCountG,x]=imhist(img(:,:,2));   %green histogram
        [preCountB,x]=imhist(img(:,:,3));   %blue histogram
        
        cluster(1)=1;   %设定第一个聚类选取的关键帧初始为首帧  cluster变为了(1,0,0,0,0,......,0,0,0)cluster(1)是改变了第一个元素
        clusterCount(1)=clusterCount(1)+1;%clusterCount(1)为0,加1,变为1,最终 clusterCount(1)为[1,0,0,0,.....,0,0,0]
        centrodR(1,:)=preCountR; % centrodR本来是num(帧个数)行,256列,全部为0.。现在第一行为第一帧的红色直方图各个数值的高度
        centrodG(1,:)=preCountG;
        centrodB(1,:)=preCountB;
       
        visit = 1;
        for k=2:num
            img=imread(strcat('images/',filenames(k).name));  %循环读取每一帧,首先是第2帧
            [tmpCountR,x]=imhist(img(:,:,1));   %red histogram  得到红色分量直方图  第二幅图片的红色直方图
            [tmpCountG,x]=imhist(img(:,:,2));   %green histogram
            [tmpCountB,x]=imhist(img(:,:,3));   %blue histogram
     
            clusterGroupId=1;  %新定义的一个变量clusterGroupId为1
            maxSimilar=0;   %新定义,相似度
        
           
            for clusterCountI= visit:count          %目前 count为1   定义新变量clusterCountI  I来确定这一帧归属于第一个聚类还是第二个聚类
                sR=0;
                sG=0;
                sB=0;
                %运用颜色直方图法的差别函数
                for j=1:256
                    sR=min(centrodR(clusterCountI,j),tmpCountR(j))+sR;%,j从1到256,第一帧中R的所有值256个亮度  以及第二帧的红色直方图所有高度值  进行比较选最小的
                    sG=min(centrodG(clusterCountI,j),tmpCountG(j))+sG;
                    sB=min(centrodB(clusterCountI,j),tmpCountB(j))+sB;
                end
                dR=sR/sum(tmpCountR);
                dG=sG/sum(tmpCountG);
                dB=sB/sum(tmpCountB);
                %YUV,persons are sensitive to Y
                d=0.30*dR+0.59*dG+0.11*dB;  %运用颜色直方图法的差别函数  定义了d  差别函数
                if d>maxSimilar
                    clusterGroupId=clusterCountI;
                    maxSimilar=d;
                end
            end
            
            if maxSimilar>threshold
                %相似度大,与该聚类质心距离小
                %加入该聚类,并调整质心
                for ii=1:256    
                    centrodR(clusterGroupId,ii)=centrodR(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountR(ii)*1.0/(clusterCount(clusterGroupId)+1);
                    centrodG(clusterGroupId,ii)=centrodG(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountG(ii)*1.0/(clusterCount(clusterGroupId)+1);
                    centrodB(clusterGroupId,ii)=centrodB(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountB(ii)*1.0/(clusterCount(clusterGroupId)+1);
                end
                clusterCount(clusterGroupId)=clusterCount(clusterGroupId)+1;
                cluster(k)=clusterGroupId;   %第k帧在第clusterGroupId个聚类里面   cluster(3)等于1或者2,,也就是属于第一个聚类或者第二个聚类
            else
                %形成新的聚类,增加一个聚类质心
                count=count+1;
                 visit = visit+1;
                clusterCount(count)=clusterCount(count)+1;
                centrodR(count,:)=tmpCountR;
                centrodG(count,:)=tmpCountG;
                centrodB(count,:)=tmpCountB;
                cluster(k)=count;   %第k帧在第count个聚类里面   否则 cluster(k)就在新建的聚类中
            end
        end
        
        %至此,所有帧都划进相应的聚类,一共有count个聚类,第k帧在第cluster(k)聚类中
        %现欲取出每个聚类中离质心距离最近,即相似度最大的作为该聚类的关键帧
        maxSimilarity=zeros(1,count);
        frame=zeros(1,count);
        for i=1:num
            sR=0;
            sG=0;
            sB=0;
            %运用颜色直方图法的差别函数
            for j=1:256
                sR=min(centrodR(cluster(i),j),tmpCountR(j))+sR;%每一帧和聚类质心进行比较,取最小值 
                sG=min(centrodG(cluster(i),j),tmpCountG(j))+sG;
                sB=min(centrodB(cluster(i),j),tmpCountB(j))+sB;
            end
            dR=sR/sum(tmpCountR);
            dG=sG/sum(tmpCountG);
            dB=sB/sum(tmpCountB);
            %YUV,persons are sensitive to Y
            d=0.30*dR+0.59*dG+0.11*dB;
            if d>maxSimilarity(cluster(i))
                maxSimilarity(cluster(i))=d;
                frame(cluster(i))=i;
            end
        end
        
        for j=1:count
            key(frame(j))=1;
            figure(j);
            imshow(strcat('images/',filenames(frame(j)).name));
        end
    end
     
    keyFrameIndexes=find(key)
    

    这种方法在878帧图片中提取出198帧,冗余度还是比较高。

    (11) 使用CNN来保存图片在使用聚类法提取关键帧,这种方法由于TensorFlow环境搭配的有问题,没有实际运行,在这给出链接

    这个

    三、整理结果

    博主使用一分十五秒00000.MP4视频, 共1898帧,分为七类,帧差最大值帧间差法提取的效果很好。k-means聚类法效果不好,原因在于(1)代码是copy 别人的,没有进行优化,(2)博主对帧间聚类方法理论研究浅薄,指导不了实践。

    k-means聚类 提出484帧

    帧差最大值帧间差法提取出35帧

    参考资料:

    [1]苏筱涵.深度学习视角下视频关键帧提取与视频检索研究[J].网络安全技术与应用,2020(05):65-66.

    [2]王红霞,王磊,晏杉杉.视频检索中的关键帧提取方法研究[J].沈阳理工大学学报,2019,38(03):78-82.

    [3]王俊玲,卢新明.基于语义相关的视频关键帧提取算法[J/OL].计算机工程与应用:1-10[2020-11-04].http://kns.cnki.net/kcms/detail/11.2127.TP.20200319.1706.018.html.

    [4] 张晓宇,张云华.基于融合特征的视频关键帧提取方法.计算机系统应用,2019,28(11):176–181. http://www.c-s-a.org.cn/1003-3254/7163.html

    [5] [1]周舟,韩芳,王直杰.面向手语识别的视频关键帧提取和优化算法[J/OL].华东理工大学学报(自然科学版):1-8[2020-11-05].https://doi.org/10.14135/j.cnki.1006-3080.20191201002.

    附录:

    1、数字视音频处理知识点小结

    [2020-11-04].http://kns.cnki.net/kcms/detail/11.2127.TP.20200319.1706.018.html.

    [4] 张晓宇,张云华.基于融合特征的视频关键帧提取方法.计算机系统应用,2019,28(11):176–181. http://www.c-s-a.org.cn/1003-3254/7163.html

    [5] [1]周舟,韩芳,王直杰.面向手语识别的视频关键帧提取和优化算法[J/OL].华东理工大学学报(自然科学版):1-8[2020-11-05].https://doi.org/10.14135/j.cnki.1006-3080.20191201002.
    [6] https://me.csdn.net/cungudafa这位小姐姐的博客

    附录:

    1、数字视音频处理知识点小结

    展开全文
  • 针对运动视频关键帧提取结果运动表达能力差的问题,以健美操运动视频关键帧提取为例,将先验语义引入到视频片段分割和关键帧提取特征提取等过程中,提出基于先验的运动视频关键帧提取算法。该算法采用韵律特征和动作...
  • 视频关键帧提取

    2015-12-10 10:49:13
    MATLAB版关键帧提取,还可以将视屏分成一个个图片。很方便的工具。
  • 关键帧提取技术是视频检索和视频摘要等领域的关键技术之一。针对现有关键帧提取方法缺乏时空特性分析,而且提取结果不符合人眼视觉感知,为此提出了一种新的关键帧提取方法。首先提取视频镜头的时空切片,然后对视频...
  • 本文首先介绍了关键帧提取技术的研究背景和意义,以及国内外的研究现 状,然后对当一前比较流行的一些关键帧提取方法进行了阐述和详细的分析,并对 每一种方法进行了测试。常见的关键帧提取算法有、基于镜头边界的方法...
  • 当前对视频的分析通常是基于视频帧,但视频帧通常存在大量冗余,所以关键帧的提取至关重要.现有的传统手工提取方法通常存在漏帧,冗余帧等现象....由实验结果可得本文方法相对以往关键帧提取方法有更好的表现.
  • 视频关键帧提取方法研究、视频关键帧提取方法研究
  • matlab 关键帧提取

    2013-07-24 10:30:27
    非常实用的matlab关键帧提取,基于mpeg格式
  • 针对运动类视频特征不易提取且其关键帧结果中易产生较多漏检帧的问题,提出基于运动目标特征的关键帧提取算法。该算法在强调运动目标特征的同时弱化背景特征,从而防止由于运动目标过小而背景占据视频画面主要内容所...
  • 《结合主成分分析和聚类的关键帧提取》《结合主成分分析和聚类的关键帧提取》《结合主成分分析和聚类的关键帧提取》 作者 许文竹,徐立鸿 聚类的复杂度是并不低的,所以我们需要通过降低数据的维度来进行计算。 ...

    参考自
    《结合主成分分析和聚类的关键帧提取》
    作者
    许文竹,徐立鸿

    聚类的复杂度是并不低的,所以我们需要通过降低数据的维度来进行计算。

    主成分分析

    简单介绍

    PCAPCA
    这里我们使用PCAPCA提取图像特征。

    PCAPCA主要用于数据降维,对于高纬向量可以用PCAPCA求出其投影矩阵,将特征从高维降到低维,并且仍能保证反映了图像的特征。

    具体做法

    对于whw*h的图像,进行扁平化得到一组向量(1,wh)(1,wh)
    但维度较高,XX表示包含所有帧,则XX的维度是(n,wh)(n,wh)

    我们求得总体的协方差矩阵 =1Ni=1N(Xiu)(Xiu)T\sum=\frac{1}{N}\sum\limits_{i=1}^N(X_i-u)(X_i-u)^T
    uu表示平均帧图像

    求得协方差矩阵了之后,可以求解\sum的特征值和特征向量,QRQR或者SVDSVD都可以。
    设特征值为λi\lambda_i,则有:αi=1Lλii=1whλi\alpha \leq \frac{\sum\limits_{i=1}^L\lambda_i}{\sum\limits_{i=1}^{wh} \lambda_i}LL等价于我们降到多少维。
    α\alpha0.900.900.990.99

    但是由于协方差矩阵过大,我们无法显示计算特征值,我们需要通过SVDSVD求得奇异值,用奇异值代替进行选择。最终得到的结果要和图像的个数取最小(具体原因并不懂,数学相关,否则会出错)

    聚类

    得到图像的特征之后,我们对图像进行聚类,我们使用kmeanskmeans算法,但需要注意的是:
    1、此时的聚类,并不是漫无目的的去找聚类中心,而是每次只找自己附近的聚类中心。
    2、同时kk的选择,当平均帧差3500\leq 3500的时候,说明视频总体变换缓慢,但考虑到可能有局部剧烈运动的情况,k=max(k1,k2)k=max(k_1,k_2)k1k_1是按比例求的关键帧个数,k1=n/100k_1 = n/100k2=frame>Tk_2=frame>T,即帧差过大的个数,这里的TT设置成1300013000
    3、当>3500>3500的时候,k=k1=n/50k=k_1=n/50
    4、防止迭代次数过多,我们设置个阈值100100

    最后即可求得结果,写完了但没测。
    代码如下:

    import numpy as np
    from sklearn.decomposition import PCA
    import cv2
    
    ansl = [1,94,132,154,162,177,222,236,252,268,286,310,322,255,373,401,
    423,431,444,498,546,594,627,681,759,800,832,846,932,1235,1369,1438,1529,1581,1847]
    ansr = [93,131,153,161,176,221,235,251,267,285,309,321,354,372,400,
    422,430,443,497,545,593,626,680,758,799,831,845,931,1234,1368,1437,
    1528,1580,1846,2139]#关键帧区间
    ansl = np.array(ansl)
    ansr = np.array(ansr)
    
    cap = cv2.VideoCapture('D:/ai/CV/pyt/1.mp4')
    Frame_rate = cap.get(5)#一秒多少帧
    Frame_number = int(cap.get(7))#帧数
    Frame_time = 1000 / Frame_rate;#一帧多少秒
    len_windows = 0
    local_windows = 0
     
    def smooth(swift_img,windows):
        r = swift_img.shape[1]
        c = swift_img.shape[2]
        for i in range(r):
            for j in range(c):
                L = swift_img[:,i,j]
                L = np.convolve(L,np.ones(windows),'same')
                swift_img[:,i,j] = L
        return swift_img
        
    def get_block(img):
        img = np.array(img)
        img = img.ravel()
        return img
        
    def get_img(now_time = 0,get_number = Frame_number):#便于算法学习
        swift_img = []#转换后
        index = 0#标记第几个帧
        time = now_time#当前时间
        while (cap.isOpened()):
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            if not ret:
                break
            img0 = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)#转换成灰度图
            img1 = get_block(img0)
            swift_img.append(img1)
            time += Frame_time
            index += 1
            if index >= get_number:
                break
            if index % 50 ==0:
                print("当前到达"+str(index))
        swift_img = np.array(swift_img)
        return swift_img
    
    def get_key_frame(Change):
        diff_C = Change[1:] - Change[:-1]
        mid_d = np.zeros(Change.shape[1])
        for i in range(diff_C.shape[1]):
            mid_d[i] = np.mean(diff_C[:,i])
        mid_d = np.sum(np.abs(mid_d))
        k = 0
        k1 = 0
        k2 = 0 
        T = 13000
        #确定聚类个数k
        #当mid_d<=3500的时候,说明视频内容变换平缓
        print(mid_d)
        if mid_d <= 3500:
            k1 = Frame_number / 100
            k2 = np.sum(diff_C >= T)
            k = max(k1,k2)
        else :
            k1 = Frame_number /  50
            k2 = np.sum(diff_C >= T)
            k = k1
        k = int(k)
        print(k)
        #确认了提取关键帧的数量,接下来进行聚类法提取关键帧
        Cluster = []
        set_cluster = []
        now = 0
        for i in range(k):
            if now >= Frame_number:
                now = Frame_number
            Cluster.append(Change[now-1])
            set_cluster.append({now})
            now += int(Frame_number / k)
        cnt = 0#防止迭代次数过多
        while True:
            cnt += 1
            now = 0#指代当前分配帧数
            for i in range(k):
                set_cluster[i].clear();#先清空每个集合
            for i in range(Frame_number):
                l = now 
                r = min(now + 1,k-1)
                ldiff = np.mean(abs(Cluster[l] - Change[i]))
                rdiff = np.mean(abs(Cluster[r] - Change[i]))
                if ldiff < rdiff:
                    set_cluster[l].add(i)
                else :
                    set_cluster[r].add(i)
                    now = r
            ok = True
            for i in range(k):
                Len = len(set_cluster[i])
                if Len == 0:
                    continue
                set_sum = np.zeros(Change.shape[1])
                for x in set_cluster[i]:
                    set_sum = set_sum + Change[x]
                set_sum /= Len
                if np.mean(abs(Cluster[i]-set_sum)) < 1e-10:
                    continue
                ok = False  
                Cluster[i] = set_sum
            print("第"+str(cnt)+"次聚类")
            if cnt >= 100 or ok == True:
                break
        TL = []
        for i in range(int(Frame_number)):
            TL.append(False)
        for i in range(k):
            MIN = 1e20
            for x in set_cluster[i]:
                MIN = min(MIN,np.mean(np.abs(Change[x] - Cluster[i])))
            for x in set_cluster[i]:
                if abs(MIN - np.mean(np.abs(Change[x] - Cluster[i]))) < 1e-10:
                    TL[x] = True
                    break
        TL = np.array(TL)
        return TL
    
    def preserve(L):
        num = 0
        time = 0
        for i in range(L.shape[0]):
            if L[i] == False:
                continue
            num += 1
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            cv2.imwrite('./1.1/{0:05d}.jpg'.format(num),img)#保存关键帧
            time += Frame_time
    
    def cal_ans(cal_L,l,r):
        rate = []
        add = 0
        right = 0
        for j in range(ansl.shape[0]):
            num = 0
            if not (l <= j and j <= r):
                continue
            ll = ansl[j]
            rr = ansr[j]
            for i in range(cal_L.shape[0]):
                if cal_L[i] == False:
                    continue
                if j == 0 :
                    print(i)
                if i + ansl[l] >= ll and i + ansl[l] <= rr:
                    num += 1
            if num == 0:
                rate.append(0.0)
            else:
                right += 1
                if num == 1:
                    rate.append(6.0)
                    continue
                add += num - 1
                rate.append(6.0)
        rate = np.array(rate)
        ret = np.sum(rate) / rate.shape[0]
        print("多余的个数:")
        print(add)
        add = add / (5 * (r - l + 1))
        add = min(add , 1)
        print("多余的占比:")
        print(add)
        print("正确的评分:")
        print(right)
        ret += 4 * (1 - add) * right / (r - l + 1)#总共帧数中只有正确的部分才考虑时间因素。
        print("评分是:")
        print(ret)
        return ret
    
    def study():
        window = 1
        local = 2
        mmax = 0
        lindex = 4
        rindex = 10
        for i in range(10):
            tmp = 1 + i
            for j in range(10):
                Tmp = 2 + j
                print("当前参数: "+"卷积窗口"+str(tmp)+"最值窗口"+str(Tmp))
                tmp_img = get_img(ansl[lindex],ansr[rindex])
                tmp_img = smooth(tmp_img,tmp)
                tmp_L = get_key_frame(tmp_img,Tmp)
                ttmp = cal_ans(tmp_L,lindex,rindex)
                if ttmp > mmax:
                    window = tmp
                    local = Tmp
                    mmax = ttmp
                print("分割线--------------------")
        return window,local
    
    def PCA_get_feature(X):
        #k is the components you want
        #mean of each feature
        mean_X = X.mean(axis = 0)
        X = X - mean_X
        #数据中心化
        k = 1
        U,S,V = np.linalg.svd(X,full_matrices = False)
        index = 0
        S_sum = np.sum(S)
        now = 0
        P = 0
        while True:
            now += S[index]
            index+=1
            if now >= 0.90 * S_sum:
                P = index / S.shape[0]
            if index == S.shape[0]:
                P = 1.0
                break
        k =  int(P * min(X.shape[1],X.shape[0]))
        #计算降维数量
        pca = PCA(n_components = k)
        pca.fit(X)#利用中心化的X建造模型
        new_x = pca.fit_transform(X)#得到降维后的数据
        return new_x
    
    swift_img = get_img()
    Frame_number = int(swift_img.shape[0])
    #Change = PCA_get_feature(swift_img)
    cal_L = get_key_frame(swift_img)
    print("结束")
    cal_ans(cal_L,0,ansl.shape[0]-1)
    
    

    初始评分,44分。
    显然以分段设置聚类中心还是不够合理。
    我们按比例设置聚类中心,可以达到6.01分.
    按极值点设置聚类中心,可以达到7.91分

    import numpy as np
    from sklearn.decomposition import PCA
    import cv2
    
    ansl = [1,94,132,154,162,177,222,236,252,268,286,310,322,355,373,401,
    423,431,444,498,546,594,627,681,759,800,832,846,932,1235,1369,1438,1529,1581,1847]
    ansr = [93,131,153,161,176,221,235,251,267,285,309,321,354,372,400,
    422,430,443,497,545,593,626,680,758,799,831,845,931,1234,1368,1437,
    1528,1580,1846,2139]#关键帧区间
    ansl = np.array(ansl)
    ansr = np.array(ansr)
    
    cap = cv2.VideoCapture('D:/ai/CV/pyt/1.mp4')
    Frame_rate = cap.get(5)#一秒多少帧
    Frame_number = int(cap.get(7))#帧数
    Frame_time = 1000 / Frame_rate;#一帧多少秒
    len_windows = 0
    local_windows = 0
     
    def smooth(swift_img,windows):
        r = swift_img.shape[1]
        c = swift_img.shape[2]
        for i in range(r):
            for j in range(c):
                L = swift_img[:,i,j]
                L = np.convolve(L,np.ones(windows),'same')
                swift_img[:,i,j] = L
        return swift_img
        
    def get_block(img):
        img = np.array(img)
        img = img.ravel()
        return img
        
    def get_img(now_time = 0,get_number = Frame_number):#便于算法学习
        swift_img = []#转换后
        index = 0#标记第几个帧
        time = now_time#当前时间
        while (cap.isOpened()):
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            if not ret:
                break
            img0 = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)#转换成灰度图
            img1 = get_block(img0)
            swift_img.append(img1)
            time += Frame_time
            index += 1
            if index >= get_number:
                break
            if index % 50 ==0:
                print("当前到达"+str(index))
        swift_img = np.array(swift_img)
        return swift_img
    
    def get_key_frame(Change):
        diff_C = Change[1:] - Change[:-1]
        mid_d = np.zeros(Change.shape[1])
        for i in range(diff_C.shape[1]):
            mid_d[i] = np.mean(diff_C[:,i])
        mid_d = np.sum(np.abs(mid_d))
        k = 0
        k1 = 0
        k2 = 0 
        T = 13000
        #确定聚类个数k
        #当mid_d<=3500的时候,说明视频内容变换平缓
        print(mid_d)
        if mid_d <= 3500:
            k1 = Frame_number / 100
            k2 = np.sum(diff_C >= T)
            k = max(k1,k2)
        else :
            k1 = Frame_number /  50
            k2 = np.sum(diff_C >= T)
            k = k1
        k = int(k)
        print(k)
        #确认了提取关键帧的数量,接下来进行聚类法提取关键帧
        Cluster = []
        set_cluster = []
        now = 0
        for i in range(k):
            if now >= Frame_number - 2:
                now = Frame_number - 2 
            Cluster.append(Change[now])
            set_cluster.append({now})
            if(np.sum(np.abs(diff_C[now]))>mid_d):
                now += int(Frame_number / (3 * k))
            else:
                now += int(Frame_number / k)
        cnt = 0#防止迭代次数过多
        while True:
            cnt += 1
            now = 0#指代当前分配帧数
            for i in range(k):
                set_cluster[i].clear();#先清空每个集合
            for i in range(Frame_number):
                l = now 
                r = min(now + 1,k-1)
                ldiff = np.mean(abs(Cluster[l] - Change[i]))
                rdiff = np.mean(abs(Cluster[r] - Change[i]))
                if ldiff < rdiff:
                    set_cluster[l].add(i)
                else :
                    set_cluster[r].add(i)
                    now = r
            ok = True
            for i in range(k):
                Len = len(set_cluster[i])
                if Len == 0:
                    continue
                set_sum = np.zeros(Change.shape[1])
                for x in set_cluster[i]:
                    set_sum = set_sum + Change[x]
                set_sum /= Len
                if np.mean(abs(Cluster[i]-set_sum)) < 1e-10:
                    continue
                ok = False  
                Cluster[i] = set_sum
            print("第"+str(cnt)+"次聚类")
            if cnt >= 100 or ok == True:
                break
        TL = []
        for i in range(int(Frame_number)):
            TL.append(False)
        for i in range(k):
            MIN = 1e20
            for x in set_cluster[i]:
                MIN = min(MIN,np.mean(np.abs(Change[x] - Cluster[i])))
            for x in set_cluster[i]:
                if abs(MIN - np.mean(np.abs(Change[x] - Cluster[i]))) < 1e-10:
                    TL[x] = True
                    break
        TL = np.array(TL)
        return TL
    
    def preserve(L):
        num = 0
        time = 0
        for i in range(L.shape[0]):
            if L[i] == False:
                continue
            num += 1
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            cv2.imwrite('./1.1/{0:05d}.jpg'.format(num),img)#保存关键帧
            time += Frame_time
    
    def cal_ans(cal_L,l,r):
        rate = []
        add = 0
        right = 0
        for j in range(ansl.shape[0]):
            num = 0
            if not (l <= j and j <= r):
                continue
            ll = ansl[j]
            rr = ansr[j]
            for i in range(cal_L.shape[0]):
                if cal_L[i] == False:
                    continue
                if j == 0 :
                    print(i)
                if i + ansl[l] >= ll and i + ansl[l] <= rr:
                    num += 1
            if num == 0:
                rate.append(0.0)
            else:
                right += 1
                if num == 1:
                    rate.append(6.0)
                    continue
                add += num - 1
                rate.append(6.0)
        rate = np.array(rate)
        ret = np.sum(rate) / rate.shape[0]
        print("多余的个数:")
        print(add)
        add = add / (5 * (r - l + 1))
        add = min(add , 1)
        print("多余的占比:")
        print(add)
        print("正确的评分:")
        print(right)
        ret += 4 * (1 - add) * right / (r - l + 1)#总共帧数中只有正确的部分才考虑时间因素。
        print("评分是:")
        print(ret)
        return ret
    
    def study():
        window = 1
        local = 2
        mmax = 0
        lindex = 4
        rindex = 10
        for i in range(10):
            tmp = 1 + i
            for j in range(10):
                Tmp = 2 + j
                print("当前参数: "+"卷积窗口"+str(tmp)+"最值窗口"+str(Tmp))
                tmp_img = get_img(ansl[lindex],ansr[rindex])
                tmp_img = smooth(tmp_img,tmp)
                tmp_L = get_key_frame(tmp_img,Tmp)
                ttmp = cal_ans(tmp_L,lindex,rindex)
                if ttmp > mmax:
                    window = tmp
                    local = Tmp
                    mmax = ttmp
                print("分割线--------------------")
        return window,local
    
    def PCA_get_feature(X):
        #k is the components you want
        #mean of each feature
        mean_X = X.mean(axis = 0)
        X = X - mean_X
        #数据中心化
        k = 1
        U,S,V = np.linalg.svd(X,full_matrices = False)
        index = 0
        S_sum = np.sum(S)
        now = 0
        P = 0
        while True:
            now += S[index]
            index+=1
            if now >= 0.90 * S_sum:
                P = index / S.shape[0]
            if index == S.shape[0]:
                P = 1.0
                break
        k =  int(P * min(X.shape[1],X.shape[0]))
        #计算降维数量
        pca = PCA(n_components = k)
        pca.fit(X)#利用中心化的X建造模型
        new_x = pca.fit_transform(X)#得到降维后的数据
        return new_x
    
    swift_img = get_img()
    Frame_number = int(swift_img.shape[0])
    #Change = PCA_get_feature(swift_img)
    cal_L = get_key_frame(swift_img)
    print("结束")
    cal_ans(cal_L,0,ansl.shape[0]-1)
    
    
    展开全文
  • 针对煤矿井下特殊的监控环境,对基于帧差欧氏距离的关键帧提取算法进行了研究。针对该算法存在的关键帧冗余度较大的问题,利用Canny算法提取图像边缘并进行边缘匹配,将冗余的关键帧剔除,从而降低冗余度。理论分析和...
  • 视频关键帧提取源码

    热门讨论 2012-09-06 11:07:04
    基于互信息量、聚类等的视频关键帧提取算法,在vc6.0下编译通过。
  • python实现视频关键帧提取(基于帧间差分)

    万次阅读 多人点赞 2018-12-05 20:35:17
    python实现视频关键帧提取(基于帧间差分) 在很多场景下,我们不想或者不能处理视频的每一帧图片,这时我们希望能够从视频中提取出一些重要的帧进行处理,这个过程我们称为视频关键帧提取关键帧提取算法多种多样...

    python实现视频关键帧提取(基于帧间差分)

    在很多场景下,我们不想或者不能处理视频的每一帧图片,这时我们希望能够从视频中提取出一些重要的帧进行处理,这个过程我们称为视频关键帧提取。

    关键帧提取算法多种多样,如何实现主要取决于你对于关键帧的定义。

    也就是说,对于你的实际应用场景,视频中什么样的图片才算是一个关键帧?

    今天我实现了一种比较通用的关键帧提取算法,它基于帧间差分。

    算法的原理很简单:我们知道,将两帧图像进行差分,得到图像的平均像素强度可以用来衡量两帧图像的变化大小。因此,基于帧间差分的平均强度,每当视频中的某一帧与前一帧画面内容产生了大的变化,我们便认为它是关键帧,并将其提取出来。

    算法的流程简述如下:

    首先,我们读取视频,并依次计算每两帧之间的帧间差分,进而得到平均帧间差分强度。

    然后,我们可以选择如下的三种方法的一种来提取关键帧,它们都是基于帧间差分的:

    1. 使用差分强度的顺序

      我们对所有帧按照平均帧间差分强度进行排序,选择平均帧间差分强度最高的若干张图片作为视频的关键帧。

    2. 使用差分强度阈值

      我们选择平均帧间差分强度高于预设阈值的帧作为视频的关键帧。

    3. 使用局部最大值

      我们选择具有平均帧间差分强度局部最大值的帧作为视频的关键帧。

      这种方法的提取结果在丰富度上表现更好一些,提取结果均匀分散在视频中。

      需要注意的是,使用这种方法时,对平均帧间差分强度时间序列进行平滑是很有效的技巧。它可以有效的移除噪声来避免将相似场景下的若干帧均同时提取为关键帧。

    这里比较推荐使用第三种方法来提取视频的关键帧

    获取源码点这里

    最初的代码来自于这里, 但是其代码本身有些问题,在读取超过100M的视频时程序会出现内存溢出的错误,因此我对其进行了优化,减去了不必要的内存消耗。

    精灵宝可梦的一个经典片段中进行了实验,平滑后的平均帧间差分强度如下图所示:

    plot

    提取的部分关键帧如下所示:

    fft sift sift
    fft sift sift
    fft sift sift

    效果还不错吧~

    我这里仅仅是对视频关键帧提取的方法进行了简单的探索,最终得到的效果也满足了我实际工作的需要。如果您对视频关键帧提取领域很了解,或者了解其他更好的方法,期待与您交流~

    最后,对算法感兴趣的小伙伴,欢迎关注我的github项目AI-Toolbox

    此项目旨在提高效率,快速迭代新想法,欢迎贡献代码~

    展开全文
  • 作为第一篇,这里介绍一下我对于关键帧提取算法效率的计算方法。 同时要考虑时间和正确率,两者占比为6:46:46:4 如果在给定区间内找到了帧作为关键帧,那么权值+=4+=4+=4 如果多余的帧占总关键帧数的110\frac{1}{10}...

    最初

    作为第一篇,这里介绍一下我对于关键帧提取算法效率的计算方法。

    同时要考虑时间和正确率,两者占比为4:64:6
    满分1010

    对于66的部分,一旦取到了对应场景的关键帧,该部分+1+1
    正确性应该是对于每个场景是等价的,所以最后取平均即可。

    对于44的部分,我们计算多余的占需要得到的总关键帧的5倍的占比,每个场景和100%100\%取最值。
    最后取平均即可。
    但是需要注意的是,只有拿到了66分,才可能有44分;所以需要判断一下。

    最后两者加权得到分数。

    参考自
    ANOVELVIDEOCOPYDETECTION《A NOVEL VIDEO COPY DETECTION
    METHODBASEDONSTATISTICALANALYSISMETHOD BASED ON STATISTICAL ANALYSIS》

    作者
    HyeJeongChoHye-Jeong Cho等人

    RCC

    这里介绍一下相关系数RCCRCC,用于表示两者之间相关性。

    RCC=16diffn(n21)RCC = 1 - \frac{6*diff}{n*(n^2-1)},越接近11表示越相关

    这里的差值,论文用的欧式距离。

    那么具体是利用什么进行比较呢?

    利用的是每张图的灰度信息,得到灰度图后,分成44个块。
    对于这44个块,转换成有序度量,然后进行比较。
    具体就是对于这44个块,求平均亮度值,然后排序,得到这44个点的排序后下标。

    然后求相邻RCC<kRCC<k的帧作为关键帧。
    这个kk是可以学习的。

    最终得到的效率为 5.765.76

    代码如下:

    import numpy as np
    import cv2
    
    ansl = [1,94,132,154,162,177,222,236,252,268,286,310,322,255,373,401,
    423,431,444,498,546,594,627,681,759,800,832,846,932,1235,1369,1438,1529,1581,1847]
    ansr = [93,131,153,161,176,221,235,251,267,285,309,321,354,372,400,
    422,430,443,497,545,593,626,680,758,799,831,845,931,1234,1368,1437,
    1528,1580,1846,2139]#关键帧区间
    ansl = np.array(ansl)
    ansr = np.array(ansr)
    
    cap = cv2.VideoCapture('D:/ai/CV/pyt/1.mp4')
    Frame_rate = cap.get(5)#一秒多少帧
    Frame_number = cap.get(7)#帧数
    Frame_time = 1000 / Frame_rate;#一帧多少秒
    
    def get_block(img):
        img = np.array(img)
        row = img.shape[0] // 2
        col = img.shape[1] // 2
        L = []
        for i in range(2):
            for j in range(2):
                L.append(np.sum(img[i*row:(i+1)*row,j*col:(j+1)*col])/(row*col))
        L = np.array(L)
        L = L.ravel()
        L = np.argsort(L)
        return L
    
    def RCC(img1,img2):
        diff = 0
        for i in range(4):
            diff += np.power(img1[i]-img2[i],2)
        rcc = 1 - 6 * diff / (4 * (4 ** 2 - 1))
        return rcc
        
    def get_img(now_time = 0,get_number = Frame_number):#便于算法学习
        swift_img = []#转换后
        index = 0#标记第几个帧
        time = now_time#当前时间
        while (cap.isOpened()):
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            if not ret:
                break
            img0 = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)#转换成灰度图
            img1 = get_block(img0)
            swift_img.append(img1)
            time += Frame_time
            index += 1
            if index >= get_number:
                break
    #        if index % 50 ==0:
    #            print("当前到达"+str(index))
        swift_img = np.array(swift_img)
        return swift_img
    
    def get_key_frame(swift_img,standard):
        L = []
        for i in range(swift_img.shape[0]-1):
            tmp = RCC(swift_img[i],swift_img[i+1])
            if tmp < standard:
                L.append(True)
            else:
                L.append(False)
        L.append(False)
        L = np.array(L)
        return L
    
    def preserve(L):
        num = 0
        time = 0
        for i in range(L.shape[0]):
            if L[i] == False:
                continue
            num += 1
            cap.set(cv2.CAP_PROP_POS_MSEC,time)
            ret,img = cap.read()#获取图像
            cv2.imwrite('./1.1/{0:05d}.jpg'.format(num),img)#保存关键帧
            time += Frame_time
    
    def cal_ans(cal_L,l,r):
        rate = []
        add = 0
        for j in range(ansl.shape[0]):
            num = 0
            if not (l <= j and j <= r):
                continue
            ll = ansl[j]
            rr = ansr[j]
            for i in range(cal_L.shape[0]):
                if cal_L[i] == False:
                    continue
                if i + ansl[l] >= ll and i + ansl[l] <= rr:
                    num += 1
            if num == 0:
                rate.append(0.0)
            else:
                if num == 1:
                    rate.append(4.0)
                    continue
                add += num - 1
                rate.append(4.0)
        rate = np.array(rate)
        ret = np.sum(rate) / rate.shape[0]
        print("多余的个数:")
        print(add)
        add = add / (5 * (r - l + 1))
        add = min(add , 1)
        print("多余的占比:")
        print(add)
        print("正确的评分:")
        print(ret)
        ret += 6 * (1 - add)
        print("评分是:")
        print(ret)
        return ret
    
    def study():
        stdad = 1
        mmax = 0
        lindex = 4
        rindex = 10
        for i in range(11):
            tmp = 1.0 - 0.05 * i
            print("当前参数: "+str(tmp))
            tmp_img = get_img(ansl[lindex],ansr[rindex])
            tmp_L = get_key_frame(tmp_img,tmp)
            ttmp = cal_ans(tmp_L,lindex,rindex)
            if ttmp > mmax:
                stdad = tmp
                mmax = ttmp
            print("分割线--------------------")
        return stdad
    
    standard = study()
    print("最终参数"+str(standard))
    swift_img = get_img()
    cal_L= get_key_frame(swift_img,standard)
    cal_ans(cal_L,0,ansl.shape[0]-1)
    #preserve(cal_L)
    
    
    展开全文
  • 视频关键帧提取代码

    2014-12-25 22:02:46
    里面含有关键帧提取的代码,还有人脸检测等代码,是vs+opencv实现的
  • 针对视频关键帧提取问题,提出了一种密度峰值聚类算法,该算法利用HSV直方图将高维抽象视频图像数据转换为可量化的低维数据,并降低了捕获图像特征时的计算复杂度。 在此基础上,使用密度峰值聚类算法对这些低维数据...
  • 用matlab编的关键帧提取的代码,参考了光流法的代码。是基于帧差的欧式距离,均值,方差,差异系数下的关键帧提取。代码调试通过,运行结果理想
  • 对目前存在的几类提取关键帧的方法进行了分析,并基于车辆自身特征提出一种新的关键帧提取方法。该算法具有良好的通用性和适应性,计算简单,正确率高,有效避免了冗余,并可以控制关键帧的数量。实验结果表明,该...
  • 将视频序列通过关键帧提取的方式转换成静态图像,然后利用图像处理...首先讨论了近年来基于压缩域的关键帧提取技术,然后分析和讨论了针对敏感视频识别应用的关键帧提取方法,并给出了一种快速有效的关键帧提取方案。
  • 此代码使用函数 videoreader 通过计算直方图差异从视频中提取关键帧
  • 将视频序列通过关键帧提取的方式转换成静态图像,然后利用图像处理...首先讨论了近年来基于压缩域的关键帧提取技术,然后分析和讨论了针对敏感视频识别应用的关键帧提取方法,并给出了一种快速有效的关键帧提取方案。
  • 针对目前压缩域下提取视频关键帧的算法存在特征选取单一、提取的关键帧准确性不高、算法效率低的缺点,提出了压缩域下基于两次曲线曲率检测的关键帧提取算法。算法利用曲线上的高曲率点表示曲线的显著变化,并在此...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 23,214
精华内容 9,285
关键字:

关键帧提取