精华内容
下载资源
问答
  • 霍夫曼树

    2017-02-17 19:41:06
    霍夫曼树

    霍夫曼树 的构造

    霍夫曼树又叫做最优二叉树

    这里写图片描述

    霍夫曼编码中,使用频率少的符号编码长度长,使用频率多的符号编码长度短。可以使用二叉树来进行霍夫曼编码。

    这里写图片描述

    这里写图片描述

    小结

    霍夫曼树是一种特殊的二叉树。
    霍夫曼树应用于信息编码和数据压缩领域。
    霍夫曼树是现代压缩算法的基础。

    展开全文
  • 霍夫曼树的具体证明在离散数学书上有,我总结大意如下:我们平常查询单词的时候,总会有一些词查询的频率高,一些词查询的频率低,如果建立一棵树来查询,应该使那些经常查询的码(信息经过无损压缩后)的深度尽量小,...

    霍夫曼树的具体证明在离散数学书上有,我总结大意如下:

    我们平常查询单词的时候,总会有一些词查询的频率高,一些词查询的频率低,如果建立一棵树来查询,应该使那些经常查询的码(信息经过无损压缩后)的深度尽量小,这样,既省时间,又省空间。同样的前缀码会有不同的树权,树权最小的树就称为最优树,所以,对于英文字母的最佳编码问题就是求最优树 的问题。霍夫曼树每次贪心取最小的两片叶子,让他们成为兄弟,并成一棵树,权值相加并成新的一个节点,最后所有的节点合成一棵树,即为最优树。

    以一道题目为例:

    Entropy

    Time Limit:1000MS  Memory Limit:65536K

    Total Submit:23 Accepted:13

    Description

    An entropy encoder is a data encoding method that achieves lossless data compression by encoding a message with “wasted” or “extra” information removed. In other words, entropy encoding removes information that was not necessary in the first place to accurately encode the message. A high degree of entropy implies a message with a great deal of wasted information; english text encoded in ASCII is an example of a message type that has very high entropy. Already compressed messages, such as JPEG graphics or ZIP archives, have very little entropy and do not benefit from further attempts at entropy encoding.

    English text encoded in ASCII has a high degree of entropy because all characters are encoded using the same number of bits, eight. It is a known fact that the letters E, L, N, R, S and T occur at a considerably higher frequency than do most other letters in english text. If a way could be found to encode just these letters with four bits, then the new encoding would be smaller, would contain all the original information, and would have less entropy. ASCII uses a fixed number of bits for a reason, however: it’s easy, since one is always dealing with a fixed number of bits to represent each possible glyph or character. How would an encoding scheme that used four bits for the above letters be able to distinguish between the four-bit codes and eight-bit codes? This seemingly difficult problem is solved using what is known as a “prefix-free variable-length” encoding.

    In such an encoding, any number of bits can be used to represent any glyph, and glyphs not present in the message are simply not encoded. However, in order to be able to recover the information, no bit pattern that encodes a glyph is allowed to be the prefix of any other encoding bit pattern. This allows the encoded bitstream to be read bit by bit, and whenever a set of bits is encountered that represents a glyph, that glyph can be decoded. If the prefix-free constraint was not enforced, then such a decoding would be impossible.

    Consider the text “AAAAABCD”. Using ASCII, encoding this would require 64 bits. If, instead, we encode “A” with the bit pattern “00”, “B” with “01”, “C” with “10”, and “D” with “11” then we can encode this text in only 16 bits; the resulting bit pattern would be “0000000000011011”. This is still a fixed-length encoding, however; we’re using two bits per glyph instead of eight. Since the glyph “A” occurs with greater frequency, could we do better by encoding it with fewer bits? In fact we can, but in order to maintain a prefix-free encoding, some of the other bit patterns will become longer than two bits. An optimal encoding is to encode “A” with “0”, “B” with “10”, “C” with “110”, and “D” with “111”. (This is clearly not the only optimal encoding, as it is obvious that the encodings for B, C and D could be interchanged freely for any given encoding without increasing the size of the final encoded message.) Using this encoding, the message encodes in only 13 bits to “0000010110111”, a compression ratio of 4.9 to 1 (that is, each bit in the final encoded message represents as much information as did 4.9 bits in the original encoding). Read through this bit pattern from left to right and you’ll see that the prefix-free encoding makes it simple to decode this into the original text even though the codes have varying bit lengths.

    As a second example, consider the text “THE CAT IN THE HAT”. In this text, the letter “T” and the space character both occur with the highest frequency, so they will clearly have the shortest encoding bit patterns in an optimal encoding. The letters “C”, “I’ and “N” only occur once, however, so they will have the longest codes.

    There are many possible sets of prefix-free variable-length bit patterns that would yield the optimal encoding, that is, that would allow the text to be encoded in the fewest number of bits. One such optimal encoding is to encode spaces with “00”, “A” with “100”, “C” with “1110”, “E” with “1111”, “H” with “110”, “I” with “1010”, “N” with “1011” and “T” with “01”. The optimal encoding therefore requires only 51 bits compared to the 144 that would be necessary to encode the message with 8-bit ASCII encoding, a compression ratio of 2.8 to 1.

    Input

    The input file will contain a list of text strings, one per line. The text strings will consist only of uppercase alphanumeric characters and underscores (which are used in place of spaces). The end of the input will be signalled by a line containing only the word “END” as the text string. This line should not be processed.

    Output

    For each text string in the input, output the length in bits of the 8-bit ASCII encoding, the length in bits of an optimal prefix-free variable-length encoding, and the compression ratio accurate to one decimal point.

    Sample Input

    AAAAABCD

    THE_CAT_IN_THE_HAT

    END

    Sample Output

    64 13 4.9

    144 51 2.8

    直接代码,应该能看懂,感觉霍夫曼树挺实用的,但ACM好像不太用

    #include#include#include#includeusingnamespacestd;struct{intlc,rc,fa;

    }tree[64];stringstr;intfre[60];intbest;voidhuffman(intn,intdep)

    {if(fre[n]==0)return;if(n<=26) best+=fre[n]*dep;else{

    huffman(tree[n].lc,++dep);

    huffman(tree[n].rc,dep);

    }

    }intmain()

    {inti,j,left,right;while(cin>>str)

    {if(str=="END")break;intlen=str.length();for(i=0;i<=60;i++)

    tree[i].fa=-1;

    memset(fre,0,sizeof(fre));for(i=0;i

    {if(str[i]=='_') fre[26]++;elsefre[str[i]-'A']++;

    }intnode=26;while(1)

    {intmin=1000;for(j=0;j<=node;j++)

    {if(tree[j].fa==-1)

    {if(min>fre[j]&&fre[j])

    {

    min=fre[j];

    left=j;

    }

    }

    }

    min=1000;for(i=0;i<=node;i++)

    {if(tree[i].fa==-1)

    {if(min>fre[i]&&i!=left&&fre[i])

    {

    min=fre[i];

    right=i;

    }

    }

    }if(min==1000)break;

    fre[++node]=fre[left]+fre[right];

    tree[node].lc=left;

    tree[node].rc=right;

    tree[node].fa=-1;

    tree[left].fa=node;

    tree[right].fa=node;

    }

    best=0;if(node==26) best=len;elsehuffman(node,0);

    len*=8;

    printf("%d %d %.1f\n",len,best,len*1.0/best);

    }return0;

    }

    展开全文
  • Python实现霍夫曼树

    2020-12-21 00:45:51
    Python实现霍夫曼树 霍夫曼树是一种特殊的二叉树,是一种带权路径长度最短的二叉树,又称为最优二叉树。 给定 N 个权值作为二叉树的 N 个叶节点的权值,构造一棵二叉树,若该二叉树的带权路径长度达到最小,则称该...
  • 霍夫曼树 以此谨记自己学习java心得 这几天一直再学数据结构与算法,学到了树结构,前天也简单讲了线索化二叉树的前序遍历。今天所学的是霍夫曼树,相比较而言霍夫曼树比以前的数组、链表这种难一点。尤其是霍夫曼...

    霍夫曼树

    以此谨记自己学习java心得
    这几天一直再学数据结构与算法,学到了树结构,前天也简单讲了线索化二叉树的前序遍历。今天所学的是霍夫曼树,相比较而言霍夫曼树比以前的数组、链表这种难一点。尤其是霍夫曼编码,让我看着老师讲解,还写了一个下午的代码。
    首先,先简单介绍一下霍夫曼树,霍夫曼树的每一个叶子结点所代表的数字,我暂且称之为权值。另外一个概念,我称之为层数,如果以根节点为第一层,则每一行为一层,就可得到叶子结点的层数,再将该叶子结点的权值乘以该叶子结点到根节点的层数差,这个乘积就叫做带权路径长度。根据排列组合,如果带权路径长度最小,就叫做霍夫曼树。中间的就是霍夫曼树(从我上课老师PPT里截取)那么如何通过代码来实现一个霍夫曼树,思路如下,先创建一个Tree类用与存放结点,结点属性拥有value ,left,right。再给一个一维Int数组,数组里的值可以当作Tree对象的value。前提,这个数组必须是有序的,在这里默认为从小到大,其实这个数组排序方法我有七八种方法,等下次在做一个仔细说明各个排序算法。排完序后,先将Tree的对象以value为形参创建起来。我这里将数组存放于集合中,再将集合的0号索引和1号索引取出,将这两个索引值相加,再以这个相加的值创建Tree对象,再将这个对象添加进集合中,这个对象以left,right的形式保存之前的0号位和1号位,再集合中将原来的0号位与1号位删除,再进行集合内部排序。这个是我自己画的草图
    到最后集合中只剩一个数据,然后再将这个数据作为根节点进行前序遍历
    代码如下,首先是结点信息

    
    ```java
    class  Node implements Comparable<Node>{
        public int value;
        public Node left;
        public  Node right;
    
        public Node(int value) {
            this.value = value;
        }
    
        public Node() {
        }
    
    
        @Override
        public String toString() {
            return "Node{" +
                    "value=" + value +
                    '}';
        }
    
    
        @Override
        public int compareTo(Node o) {
            if (this.value>o.value){
    
                return 1;
            }
            if (this.value==o.value){
                if (o.left!=null) {
                    return -1;
                }
            }
            return -1;
    
        }
    }
    
    然后是霍夫曼树的类`
    
    public class HuffmanTree {
        public static void main(String[] args) {
            int []array=new int[]{2,3,5,6,4};
            Node node = createTree(array);
    //        System.out.println(node);
    
        }
        public static Node  createTree(int [] arr){
            ArrayList<Node> list = new ArrayList<>();
            for (int i = 0; i < arr.length; i++) {
                Node node = new Node(arr[i]);
                list.add(node);
            }
            Collections.sort(list);
            System.out.println(list);
    //        System.out.println(list);
            while (list.size()>1) {
                Node node1 = list.get(0);
                Node node2 = list.get(1);
                Node node = new Node(node1.value + node2.value);
                node.left=node1;
                node.right=node2;
                list.remove(node1);
                list.remove(node2);
                list.add(node);
                Collections.sort(list);
            }
            prologue(list.get(0));
            System.out.println(list.get(0).left);
            return list.get(0);
    
    
        }
        public  static  void prologue(Node node){
            System.out.println(node.value);
            if (node.left!=null){
                prologue(node.left);
            }
            if (node.right!=null){
                prologue(node.right);
            }
        }
    }
    ```明天再做霍夫曼编码分析
    
    
    展开全文
  • 霍夫曼树-源码

    2021-02-21 17:09:56
    霍夫曼树 HuffmanTree是计算机科学III-数据结构的一项分配程序,于2018年11月完成。
  • 一、概述Huffman Tree,中文名是哈夫曼树或霍夫曼树,它是最优二叉树。定义:给定n个权值作为n个叶子结点,构造一棵二叉树,若树的带权路径长度达到最小,则这棵树被称为哈夫曼树。1.1、几个概念(01) 路径和路径长度...

    一、概述

    Huffman Tree,中文名是哈夫曼树或霍夫曼树,它是最优二叉树。

    定义:给定n个权值作为n个叶子结点,构造一棵二叉树,若树的带权路径长度达到最小,则这棵树被称为哈夫曼树。

    1.1、几个概念

    (01) 路径和路径长度

    定义:在一棵树中,从一个结点往下可以达到的孩子或孙子结点之间的通路,称为路径。通路中分支的数目称为路径长度。若规定根结点的层数为1,则从根结点到第L层结点的路径长度为L-1。

    例子:100和80的路径长度是1,50和30的路径长度是2,20和10的路径长度是3。

    (02) 结点的权及带权路径长度

    定义:若将树中结点赋给一个有着某种含义的数值,则这个数值称为该结点的权。结点的带权路径长度为:从根结点到该结点之间的路径长度与该结点的权的乘积。

    例子:节点20的路径长度是3,它的带权路径长度= 路径长度 * 权 = 3 * 20 = 60。

    (03) 树的带权路径长度

    定义:树的带权路径长度规定为所有叶子结点的带权路径长度之和,记为WPL。

    例子:示例中,树的WPL= 1*100 + 2*80 + 3*20 + 3*10 = 100 + 160 + 60 + 30 = 350。

    示例

    上面的两棵树都是以{10, 20, 50, 100}为叶子节点的树。

    左边的树WPL=2*10 + 2*20 + 2*50 + 2*100 = 360

    右边的树WPL=350

    左边的树WPL > 右边的树的WPL。你也可以计算除上面两种示例之外的情况,但实际上右边的树就是{10,20,50,100}对应的哈夫曼树。

    二、哈夫曼树的构造

    假设有n个权值,则构造出的哈夫曼树有n个叶子结点。 n个权值分别设为 w1、w2、…、wn,哈夫曼树的构造规则为:

    1. 将w1、w2、…,wn看成是有n 棵树的森林(每棵树仅有一个结点);

    2. 在森林中选出根结点的权值最小的两棵树进行合并,作为一棵新树的左、右子树,且新树的根结点权值为其左、右子树根结点权值之和;

    3. 从森林中删除选取的两棵树,并将新树加入森林;

    4. 重复(02)、(03)步,直到森林中只剩一棵树为止,该树即为所求得的哈夫曼树。

    以{5,6,7,8,15}为例,来构造一棵哈夫曼树。

    第1步:创建森林,森林包括5棵树,这5棵树的权值分别是5,6,7,8,15。

    第2步:在森林中,选择根节点权值最小的两棵树(5和6)来进行合并,将它们作为一颗新树的左右孩子(谁左谁右无关紧要,这里,我们选择较小的作为左孩子),并且新树的权值是左右孩子的权值之和。即,新树的权值是11。 然后,将"树5"和"树6"从森林中删除,并将新的树(树11)添加到森林中。

    第3步:在森林中,选择根节点权值最小的两棵树(7和8)来进行合并。得到的新树的权值是15。 然后,将"树7"和"树8"从森林中删除,并将新的树(树15)添加到森林中。

    第4步:在森林中,选择根节点权值最小的两棵树(11和15)来进行合并。得到的新树的权值是26。 然后,将"树11"和"树15"从森林中删除,并将新的树(树26)添加到森林中。

    第5步:在森林中,选择根节点权值最小的两棵树(15和26)来进行合并。得到的新树的权值是41。 然后,将"树15"和"树26"从森林中删除,并将新的树(树41)添加到森林中。

    此时,森林中只有一棵树(树41)。这棵树就是我们需要的哈夫曼树!

    代码地址:地址 中的data-004-tree中 huffman

    参看地址:

    http://www.cnblogs.com/skywang12345/p/3706833.html

    展开全文
  • 霍夫曼编码是霍夫曼树在通讯领域的经典应用之一 霍夫曼编码广泛用于数据文件的压缩,压缩率通常在20% 到90%,通常数据的重复率越高,那么压缩率就越高 霍夫曼编码是可变字长编码(VLC)的一种,由霍夫曼提出,又称之...
  • c++霍夫曼树

    2020-11-03 09:26:49
    这个方法是霍夫曼想出来的,称为霍夫曼树 2霍夫曼树的构造 对于文本”BADCADFEED”的传输而言,因为重复出现的只有 ”ABCDEF”这6个字符,因此可以用下面的方式编码: 接收方可以根据每3个bit进行一次字符解码的...
  • 第三十八课第三十八课 第三十八课第三十八课 霍夫曼树霍夫曼树 霍夫曼树霍夫曼树 版权声明版权声明本课件及其印刷物本课件及其印刷物视频的版权归成都国嵌信息技术有限公司所有视频的版权归成都国嵌信息技术有限公司...
  • 文章目录前言霍夫曼编码思想定长与变长WEP霍夫曼树构建霍夫曼树为什么霍夫曼树WEP最小实现时间复杂度 前言 本节中的霍夫曼树可以看作最小堆的应用。霍夫曼编码则是霍夫曼(Huffman)设计的一种压缩算法。霍夫曼编码...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 981
精华内容 392
关键字:

霍夫曼树