精华内容
下载资源
问答
  • python统计一个文件夹下指定文件的大小并获得这个文件下所有文件大小的总和以及得到最大文件里面出现次数最多的单词。萌新有错的地方请多指教
  • 思路:英文文章中标点符号处理,单词大小写处理,再将单词通过字典的统计出现次数,最后用sorted()排序#利用maketrans函数将标点符号映射为空格table = str.maketrans(',.!"?:[]-',' ')#打开需要统计...

    思路:英文文章中的标点符号的处理,单词大小写的处理,再将单词通过字典的统计出现次数,最后用sorted()排序

    #利用maketrans函数将标点符号映射为空格
    table = str.maketrans(',.!"?:[]-',' ')
    #打开需要统计的文件
    f = open(r'C:\Users\CryptFiend\Downloads\python\1.txt')
    file1 = f.read()
    f.close()
    #根据maketrans的映射将文章中的内容进行处理,映射中的标点符号替换为空格,并且全部小写
    f1 = file1.translate(table).lower()
    #将文章中的单词分隔开来,存在数组之中
    wordlist = f1.split(None)
    #创建一个字典,统计每个单词出现的次数
    d1 = {}
    for word in wordlist:
    d1[word] = d1.get(word,0) +1
    #通过sorted函数排序,打印出前三
    itemli =sorted(d1.items(), key=lambda x:x[1],reverse=True)
    print (itemli[0:3])

    执行结果如下:
    [('the', 23), ('to', 13), ('of', 11)]

    #1.txt

    Hillary Clinton's visit to India suffered another setback this week as the former secretary of state fractured her wrist after slipping in the bathtub at the five-star resort where she was staying, according to a report by DNA India.

    The website reported that Clinton was taken to a hospital in the city of Jodphur at around 5 a.m. local time Wednesday. Clinton underwent an X-ray and a CT scan that confirmed a hairline fracture of her right wrist.

    The Times of India reported that Clinton had been given a plaster bandage and advised to go for another checkup in three days. The injury does not impact Clinton's ability to travel.

    The Times of India and DNA India both reported that Clinton had been treated for pain in her right hand since she arrived in Jodphur Tuesday afternoon. The pain forced her to cancel a planned visit to the 15th-century Mehrangarh Fort Tuesday evening.

    Earlier in the week, video showed Clinton slipping on stairs twice as she visited the Jahaz Mahal in the ancient city of Mandu. Clinton appeared to use her right had to catch herself on the stairs, but it was not immediately clear whether this fall was the source of her injury.

    At the time of her injury, Clinton was staying at the Umaid Bhawan Palace, which houses the onetime ruling family of Jodphur and also functions as a hotel -- offering rooms from $700 per night.

    Clinton attracted controversy earlier in her visit to India. At a conference in Mumbai over the weekend, she again suggested that racism and misogyny were explanations for her loss in the 2016 presidential election.

    "I won the places that represent two-thirds of America's gross domestic product," Clinton said. "So I won the places that are optimistic, diverse, dynamic, moving forward. And [President Trump's] whole campaign, 'Make America Great Again,' was looking backwards."

    "You know, you didn't like black people getting rights, you don't like women, you know, getting jobs," she went on. "You don't want, you know, to see that Indian American succeeding more than you are."

    Clinton also claimed that white women voted for Trump because they succumbed to "a sort of ongoing pressure to vote the way that your husband, your boss, your son, whoever, believes you should."




    展开全文
  • #写一个文本统计的脚本:计算并打印有关文本文件的统计数据,包括文件里包含多少个字符、行、单词数,以及前10个出现次数最多的单词按顺序排列 import time keep=['a','b','c','d','e','f','g','h','i','j','k','l',...

    python入门教程至此已学习完毕,下面是结业脚本:(一部分是书里的源码,一部分是自己加的练习题)

    #写一个文本统计的脚本:计算并打印有关文本文件的统计数据,包括文件里包含多少个字符、行、单词数,以及前10个出现次数最多的单词按顺序排列
    import time
    keep=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',' ','-',"'"]
    stop_words=['the','and','i','to','of','a','you','my','that','in','she','he','her','his','it','be','was','had']
    def normalize(s):
    	result=''
    	for c in s.lower():
    		if c in keep:
    			result+=c
    	return result
    def make_dict(s):
    	words=normalize(s).split()
    	d={}
    	for w in words:
    		if w in d:
    			d[w]+=1
    		else:
    			d[w]=1
    	return d
    def file_status(f):
    	c=open(f).read()
    	''' 采用每次读取一行的方式
    	fopen=open(f)
    	c=''
    	for line in fopen:
    		c+=line
    	'''
    	print(f,'status:')
    	print('长度:',len(c))
    	print('行数:',c.count('\n'))
    	print('单词数:',len(normalize(c).split()))
    	d=make_dict(c)
    	print('单词数:',sum(d[w] for w in d))
    	print('不同单词数:',len([w for w in d]))
    	print('单词平均长度:',sum(len(w) for w in d)/sum(d[w] for w in d))
    	print('只出现过一次的单词总数:',len([d[w] for w in d if d[w]==1]))
    	lst=[(d[w],w) for w in d]
    	lst.sort()
    	lst.reverse()
    	print('前10名出现次数最多的单词和次数是:')
    	i=1
    	for count,word in lst[:10]:
    		print('%d.%4d %s'%(i,count,word))
    		i+=1
    	print('前10名出现次数最多的单词和次数是(去掉功能词后):')
    	j=1
    	for count,word in lst[:]:
    		if word not in stop_words:
    			print('%d.%4d %s'%(j,count,word))
    			j+=1
    		if j==11:
    			break
    start_time=time.time()
    file_status('D:\Code\python\pg1342.txt')
    end_time=time.time()
    print('总时间:',str(end_time-start_time))

    给自己赞一个 ^^

    附:教程《python编程入门(第3版)》【加】Toby

    展开全文
  • 下面我编写的这个程序可以用作:统计文件中所有的字符数、行数、出现次数最多的单词。 # -*- coding:utf-8 -*- ####首先把不相关的字符都去了,比如去除标点符号等 def normallize(s): result = '' for w in s...

    python编程:统计文件中出现次数最多的前10个词,并按出现次数排列它们。

    下面我编写的这个程序可以用作:统计文件中所有的字符数、行数、出现次数最多的单词。

    # -*- coding:utf-8 -*-
    
    ####首先把不相关的字符都去了,比如去除标点符号等
    def normallize(s):
        result = ''
        for w in s.lower():
            if w in keep:
                result += w
        return result
    
    ####其次划分字符串,然后得到多个单词,构建单词字典
    def make_freq_dict(s):
        s = normallize(s)
        words = s.split()
        # print word_num
        dict = {}
        for w in words:
            if w in dict:
                dict[w] +=1
            else:
                dict[w] = 1
        return dict
    
    ####统计单词个数
    def words_num(s):
        d = make_freq_dict(s)
        count = 0
        for w in d:
            count += d[w]
        return count
    
    ####将字典转化为元组,然后根据元组排序
    ####需要注意元组排序的方法,首先排第一位,然后再排后面的
    def words_order(s):
        d = make_freq_dict(s)
        lst = []
        for w in d:
            tuple_dict = (d[w], w)
            lst.append(tuple_dict)
        lst.sort()
        lst.reverse()
        return lst
    
    if __name__ == "__main__":
        # keep = "abcdefghijklmn-\'"
        keep = {"a", "b", "c", "d", "e", "f", "g", "h", "i",
                "j", "k", "l", "m", "n", "o", "p", "q", "r",
                "s", "t", "u", "v", "w", "x", "y", "z", " ", "-", "\'"}
    
        file_1 = open("bill", 'r').read()
    
        characters_num = len(file_1)
        lines_num = file_1.count("\n")
        print characters_num
        print lines_num
    
        print words_num(file_1)
    
        # file_1 = open("bill", 'r').read()
        d = words_order(file_1)
        i = 1
        for count, word in d[:10]:
            print i, count, word
            i += 1
    
    
    
    输出结果:

    640
    14
    106
    1 6 the
    2 5 thy
    3 4 to
    4 3 and
    5 2 world's
    6 2 thou
    7 2 thine
    8 2 that
    9 2 tender
    10 2 self
    


    展开全文
  • 处理数据像这样csv文件,已经统计好了,用mapreduce处理 you,3768 i,3930 not,3981 this,4208 at,4292 on,4714 with,4737 which,5506 is,6504 had,6564 his,6813 it,7026 that,8413 was,9251 he,10280 in,...

    处理的数据像这样的csv文件,已经统计好了,用mapreduce处理的

    you,3768
    i,3930
    not,3981
    this,4208
    at,4292
    on,4714
    with,4737
    which,5506
    is,6504
    had,6564
    his,6813
    it,7026
    that,8413
    was,9251
    he,10280
    in,11813
    to,14663
    a,15366
    and,15865
    of,21107
    the,43538

    下面是绘制柱状图的代码:

    #!/usr/bin/env python
    import numpy as np
    import matplotlib.pyplot as plt
    
    
    #用nmap的dtype方法自定义一组数据类型t
    t = np.dtype([('word',str,40),('quantity',int)])
    
    #用nmap的loadtxt方法读取csv文件
    #delimiter=','将分隔符设为英文逗号
    #usecols=(0,1)选取csv文件中的前两列
    #dtype=t把输出的类型设为刚才自定义的t,不然会转换为浮点型,抛出异常
    #unpack=True为拆分存储不同的数据,即将0,1列的数据分别存在word和count中
    word,count = np.loadtxt('sorted.csv',delimiter=',',usecols=(0,1),dtype=t,unpack=True)
    
    
    #生成一个0-19的nmap数组
    X= np.arange(20)
    #取后二十个,count中最大的20个元素(之前已排好序)
    Y = count[-20:]
    
    #plt的axes方法和figure方法都会产生一个‘画板’待会画的图会在上面呈现,这里用的figure
    #plt.axes([0.025,0.025,0.95,0.95])
    plt.figure(figsize=(14,6), dpi=80)
    #plt的bar方法就是用来生成柱状图的方法,这几个参数看了就知道啥意思
    plt.bar(X, Y, facecolor='#9999ff', edgecolor='white')
    
    #设置柱状图的每个柱子顶部显示具体数值
    for x,y in zip(X,Y):
        plt.text(x+0.4, y+0.05, '%d' % y, ha='center', va= 'bottom')
    
    #设置横纵坐标的取值范围
    plt.xlim(0,X.max()*1.1), plt.xticks(X,word[-20:])
    plt.ylim(0,Y.max()*1.1), plt.yticks([])
    #保存刚刚绘制的柱状图
    plt.savefig('bar_ex.png', dpi=300)
    #展示柱状图
    plt.show()
    

    绘制出来的图如下,很丑:



    展开全文
  • 最近经理交给我一项任务,统计一个文件中每个单词出现次数,列出出现频率最多的5个单词。本文给大家带来了python 统计单词次数的思路解析,需要的朋友参考下吧
  • “一个可读文件,有一万行,一行只有一个单词单词可以重复的,求出这一万行中出现频繁次数最多的前10个单词” 二、思路 先读取文件变为列表,再用集合去重得到一个参照的列表,逆排序取前10(最大即最多的的10...
  • 分析 counts.get(w, 0) 中get()函数进行检测,若字典counts中包含w这个key(单词),则返回对应value...count() 方法用于统计字符串里某个字符出现的次数。可选参数为在字符串搜索开始与结束位置。 字符串逆序: s
  • 下面是具体的实现代码,实现了从importthis.txt文件读取单词,并统计出现次数最多的5个单词。 # -*- coding:utf-8 -*- import io import re class Counter: def __init__(self, path): """ :param path: 文件...
  • 出一个序列中出现次数最多的元素 方法一: from collections import Counter words = [ ‘look’, ‘into’, ‘my’, ‘eyes’, ‘look’, ‘into’, ‘my’, ‘eyes’, ‘the’, ‘eyes’, ‘the’, ‘eyes’, ...
  • 利用collections中的Counter来统计一篇文章中出现次数最多的单词: #!/usr/bin/env python # -*- coding: utf-8 -*- """ Created on 4/21/16 @author: Jiezhi.G@gmail.com Blog: jiezhi.github.io Reference...
  • 通过python对字符串中的所有单词进行统计统计出每个单词出现的次数,并显示出现次数最多的单词 text = "ga bu zo meuh ga zo bu meuh meuh ga zo zo meuh zo bu zo" items = text.split(' ') counters = ...
  • 要求找出 RFC 3280 中出现次数最多的长度为9的单词。 将 RFC 3280 的文本并保存到本地后用如下代码进行处理。 ---- import re text = open("in.txt",'r').read() words = re.split('[^a-zA-Z]',text) di
  • 在分析数据过程中,单词统计应用很广,尤其是提取海量文本中出现最多次数的词,往往可以进行舆论,热点等分析,应用非常广泛. 下面将针对一个文本进行单词热词统计,其中列出多种求解过程,一起交流学习,欢迎下下方留言...
  • 1.找出最重要的单词也就是找出一篇日记中出现次数最多的单词,其次为了打开这篇日记我们还涉及了文档的操作,所以我们只要在日记文件夹中依次打开每一篇日记,记录下每一篇日记中出现次数最多的单词,然后输出即可。...
  • 场景1: 某序列[2, 4, 65, 9, 4, 5, 9, …]中,找出出现次数最高的3个元素以及它们出现的次数场景2:对某...找出list出现次数最多的值以及出现的次数使用列表解析生成30个数字元素的list列表,每个元素范围在0~20之间
  • 返回出现次数最多,同时不在禁用列表中的单词。题目保证至少有一个词不在禁用列表中,而且答案唯一。禁用列表中的单词用小写字母表示,不含标点符号。段落中的单词不区分大小写。答案都是小写字母。(注:1、答案是...
  • Description 读入包含若干个单词的文本数据,将所有...(1)出现次数最多的单词及其出现次数;若次数相同,输出字典序最大的单词。 (2)出现次数最少的单词及其出现次数;若次数相同,输出字典序最小的单词。 (3)每
  • 首先要求用户输入一个正整数n,然后统计下面这段英文中标点符号和单词出现的次数,根据用户输入的正整数n,按降序打印出现次数最多的n个标点符号和单词(先打印标点符号,再打印单词,打印的标点符号和单词之间...
  • 需求是:针对三篇英文文章进行分析,计算出现次数最多的 10 个单词 逻辑很清晰简单,不算难, 使用 python 读取多个 txt 文件,将文件的内容写入新的 txt 中,然后对新 txt 文件进行词频统计,得到最终结果。 代码...
  • Python案例-文本统计

    2019-06-04 19:31:28
    参考文献:《Python编程入门...出单词数外,我们还想知道文件中出现次数最多的的前10个单词(可以排除掉一些功能词),并按出现次数排列它们。 #wordstats.py #包含所有要保留的字符的集合 keep = {'a', 'b', 'c',...
  • ### 本游戏旨在计算并打印...# 并统计出现次数最多的前10个单词,按出现次数排列好。(文件类型主要是 .txt .xls .doc-- written by LiSongbo Words = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l'...
  • 比如,以只读方式统计并打印 出文件包含多少个字符、行和单词,# 并统计出现次数最多的前10个单词,按出现次数排列好。(文件类型主要是 .txt .xls .doc-- written by LiSongboWords = {'a', 'b', 'c', 'd', 'e', 'f...
  • Python——词频统计(英文+中文)

    千次阅读 2019-07-24 14:13:35
    这里需要把《哈莫雷特》中出现次数最多的单词(前十)打印出来 在英文中,不同的单词都是有明显的分隔的,有的是以空格分隔,有的是以逗号分隔...... 这里我们需要把不同的单词分隔出来,所以我们要把所有可以...
  • Python 合并多个TXT文件并统计词频!

    千次阅读 2019-08-23 14:45:09
    需求是:针对三篇英文文章进行分析,计算出现次数最多的 10 个单词 逻辑很清晰简单,不算难, 使用 python 读取多个 txt 文件,将文件的内容写入新的 txt 中,然后对新 txt 文件进行词频统计,得到最终结果。 代码...
  • 统计一段英文中 出现次数最多的几个单词 def get_text(): text = open('eng.txt','r').read() text = text.lower() #所有单词都替换成小写 for ch in '!@#$%^&*()_+-{}[]|\<>?/.,`~':#去噪,归一化处理...

空空如也

空空如也

1 2 3
收藏数 46
精华内容 18
关键字:

统计出现次数最多的单词python

python 订阅