精华内容
下载资源
问答
  • Python 词频统计

    2018-06-16 08:38:00
    利用Python做一个词频统计 GitHub地址:FightingBob【Give me a star , thanks.】 词频统计  对纯英语的文本文件【Eg: 瓦尔登湖(英文版).txt】的英文单词出现的次数进行统计,并记录起来 代码实现 ...

    利用Python做一个词频统计

    GitHub地址:FightingBob 【Give me a star , thanks.】

    • 词频统计

      对纯英语的文本文件【Eg: 瓦尔登湖(英文版).txt】的英文单词出现的次数进行统计,并记录起来

    • 代码实现

    •  1 import string
       2 from os import path
       3 with open('瓦尔登湖(英文版).txt','rb') as text1:
       4     words = [word.strip(string.punctuation).lower() for word in str(text1.read()).split()]
       5     words_index = set(words)
       6     count_dict = {index:words.count(index) for index in words_index}
       7     with open(path.dirname(__file__) + '/file1.txt','a+') as text2:
       8         text2.writelines('以下是词频统计的结果:' + '\n')
       9         for word in sorted(count_dict,key=lambda x:count_dict[x],reverse=True):
      10             text2.writelines('{}--{} times'.format(word,count_dict[word]) + '\n')
      11         text1.close()
      12         text2.close()

       

    • 代码解析  

      • 获取文件,以二进制格式打开文件,用于读取内容

        •   1 with open('瓦尔登湖(英文版).txt','rb') as text1:

      • 获取单词列表

        • 先读取内容

          •   content = text1.read()
        • 再获取单词列表(使用split() 通过指定分隔符对字符串进行切片)

          •   words = content.split()
        • 单词大写改小写,去掉单词前后符号

          •   word,strip(string.punctuation).lower()
        • 去除重复的单词

          •   words_index = set(words)
      • 设置单词:单词次数的字典      

        •   count_dict = {index:words.count(index) for index in words_index}
      • 写入词频统计

        • 先创建文件,获取当前目录,并以追加写入的方式写入

          •   with open(path.dirname(__file__) + '/file1.txt','a+') as text2:
        • 换行写入

          •   text2.writelines('以下是词频统计的结果:' + '\n')
        • 对单词进行排序,根据次数从大到小【key=lambda x:count_dict[x]以值排序】

          •   sorted(count_dict,key=lambda x:count_dict[x],reverse=True)
        • 换行写入词频

          •   text2.writelines('{}--{} times'.format(word,count_dict[word]) + '\n')
        • 关闭资源

          •   text1.close()
          •   text2.close()

    GitHub地址:FightingBob 【Give me a star , thanks.】          

     

    转载于:https://www.cnblogs.com/littlebob/p/9189794.html

    展开全文
  • python 词频统计

    2020-04-10 18:26:49
    import collections # 词频统计库 import numpy as np # numpy数据处理库 import jieba # 结巴分词 from PIL import Image # 图像处理库 import matplotlib.pyplot as plt # 图像展示库 f=open("Text_word_...
    import re  # 正则表达式库
    import collections    # 词频统计库
    
    f=open("Text_word_frequency_statistics.txt")
    article=f.read().lower() #统一转化成小写
    f.close()
    pattern = re.compile("\t|!|,|\n|\.|:|;|\)|\(|\?|\"")
    article = re.sub(pattern,' ', article)  # 将符合正则表达式的字符用' '替代
    done=article.split(' ') #以空格为分隔符,分词
    remove=['the','and','of','a','i','in','you','my','he','his',',','s','']  #需要去除的词
    over=[]
    
    for i in done:
        if i not in remove and i!=" ":
            over.append(i)
    counts= collections.Counter(over)  # 对分词做词频统计 这里返回的是Counter对象
    sum=dict(counts)
    #b=list(zip(sum.keys(),sum.values()) )  #打包的方式
    #sum=list(sorted(b,key=operator.itemgetter(1),reverse=True))
    sum=sorted(sum.items(),key=lambda sum:(-sum[1],sum[0]))#lamabda 内的顺序为排序优先级 后面的以前面的为基准!!!即在sum[1]相等的时候才用得上sum[0]
    x=0
    for i in sum:
        print('{0:<10}'.format(i[0]),'{0:>5}'.format(i[1]))
        x+=1
        if(x==10):  #输出词频前十的单词
            break
    

    想说的都在注释里了

    展开全文
  • Python词频统计

    2019-08-21 16:55:40
    # 词频统计:将每个单词都转换为小写,去掉有些单词后面的标点符号 import string with open("D:/test.txt", 'r', encoding='utf-8') as text: # 用一个列表存储所有的单词 words = [word.strip(string....
    # 词频统计:将每个单词都转换为小写,去掉有些单词后面的标点符号
    
    import string
    
    with open("D:/test.txt", 'r', encoding='utf-8') as text:
        # 用一个列表存储所有的单词
        words = [word.strip(string.punctuation).lower() for word in text.read().split()]
        # 使用set()函数将列表转换为集合,相同的单词只出现一次
        words_index = set(words)
        # 用词典存储每个单词和单词出现的次数
        count_dict = {index:words.count(index) for index in words_index}
    # 写入文件
    out_file = open("D:/result.txt","a", encoding='utf-8')
    for word in sorted(count_dict, key=lambda x: count_dict[x], reverse=True):
        print("%-20s"% word, count_dict[word], file=out_file)

     

    展开全文
  • python词频统计

    2018-03-22 23:47:00
    词频统计预处理下载一首英文的歌词或文章将所有,.?!’:等分隔符全部替换为空格将所有大写转换为小写生成单词列表生成词频统计排序排除语法型词汇,代词、冠词、连词输出词频最大TOP10 s='Robert Zoellick, a ...


    词频统计预处理
    下载一首英文的歌词或文章
    将所有,.?!’:等分隔符全部替换为空格
    将所有大写转换为小写
    生成单词列表
    生成词频统计
    排序
    排除语法型词汇,代词、冠词、连词
    输出词频最大TOP10

    s='Robert Zoellick, a former US Trade Representative and head of the World Bank, once said: "Trade was more about politics than economics." Indeed, international trade among nations is all about business, but once politicians step in, it becomes polarizing with unexpected consequences that could lead to a trade war.' \
    'The ever-boastful US President Donald Trump tweeted that "trade wars are good, and easy to win". His newly appointed top economic advisor, Larry Kudlow, should remind him that trade wars are a pyrrhic form of competition in which even the victor is left worse off.' \
    'The US Constitution clearly states: "Congress shall regulate interstate and foreign commerce." It grants authority to the executive branch to negotiate trade agreements, but it has the last word on increasing tariffs, whether it But Trump has no patience to follow such procedures. Instead he is issuing executive orders to satisfy his political base.' \
    'The Republicans in Congress were shocked that their leader would take such a protectionist action, recognizing it was more about politics than national security. Their swift opposition forced Trump to make Canada and Mexico exceptions (to be part of North American Free Trade Agreement negotiations), and eventually minimize the effects on the US' \
    'There will not be a trade war with these countries. If it occurs, it will be with China. The Trump administration has already taken shots that may spark a trade war, including slapping tariffs on solar panels. The next, and most fierce, battlefield in today''s smartphones to enter the US market. Around the corner is Section 301 of the Trade Act of 1974 — originally intended to safeguard patent rights — that will give the US president the authority to limit China'
    s1=s.replace('?','')
    s2=s1.replace(':','')
    s3=s2.replace(',','')
    s4=s3.replace('!','')
    s5=s4.replace('','')
    s6=s5.replace('"','')
    s7=s6.replace('-','')
    s8=s7.replace('.','')
    s9=s8.lower()
    list1=s9.split()
    for i in list1:
    print(i)
    myset=set(list1)
    print(myset)
    key={}
    for i in myset:
    key[i]=list1.count(i)
    print(key[i])
    for i in {'a','an','the','to','in','on','is','are','too','am'}:
    if i in key:
    key.pop(i)
    sort=sorted(key.items(),key=lambda d:d[1],reverse=True)
    for j in range(10):
    print(sort[j])

    转载于:https://www.cnblogs.com/cairuiqi/p/8627772.html

    展开全文
  • python词频统计实例

    2020-07-24 17:17:02
    # 词频统计 import jieba # 分词库包 import snownlp # 情感分析 words = '非常时尚鞋子,非常非常非常时尚的一款鞋子,设计好看,设计设计做活动买的,超超超超超超超超超划算。满意。设计好看!' words_list = ...
  • # python词频统计

    2019-05-25 23:52:11
    如何将用python程序的方法来统计文本词频统计 ####### 首先还是先给大家把代码给大家: import jieba as j txt=open("threekingdoms.txt","r",encoding="utf8").read() txts=j.lcut(txt) keywords=["却说","二人",...
  • python词频统计_英文

    千次阅读 2020-02-12 15:43:27
    大家都在写中文的词频统计,我接触了python都有好几年了,还写英文的,真的是,就。直接贴个代码吧。 text = """ British newspapers are much smaller than they used to be and their readers are often in a ...
  • Python词频统计中,往往会判断一个char存不存在,如果存在value++,这里一种默认值的写法是dict.get(c,0), 第二个值是默认值。 class Solution: def canConstruct(self, ransomNote: str, magazine: str) ->...
  • python词频统计

    2021-01-27 20:26:09
    =0] chapter.txt[1] import jieba word_list=jieba.lcut(chapter.txt[1]) word_list[:10] # 使用pandas统计 df =pd.DataFrame(word_list,columns=['word']) df.head(30) result = df.groupby(['word']).size() print...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 1,875
精华内容 750
关键字:

python词频统计

python 订阅