精华内容
下载资源
问答
  • 三国演义人名词云
    2021-11-21 20:13:32

    设计一个程序,读出“三国演义.txt”文件中的三国演义全文,将常见人名进行去重后生成词云,并列出词频最高的10-20

    import jieba  # 优秀的中文分词第三方库
    import wordcloud
    from matplotlib import pyplot
    mk = pyplot.imread('caochao.jpg')
    txt = open('三国演义.txt','r',encoding='utf-8').read()
    # 排除一些不是人名,但是出现次数比较靠前的单词
    excludes = {"将军", "却说", "荆州", "二人", "不可", "不能", "如此", "商议", "如何", "主公", "军士", "左右", "军马", "引兵", "次日", "大喜", "天下", "东吴",
                "于是", "今日", "不敢", "魏兵", "陛下", "一人", "都督", "人马", "不知", "汉中", "只见", "众将", "后主", "蜀兵", "上马", "大叫", "太守", "此人",
                "夫人", "先主", "后人", "背后", "城中", "天子", "一面", "何不", "大军", "忽报", "先生", "百姓", "何故", "然后", "先锋", "不如", "赶来", "原来",
                "令人", "江东", "下马", "喊声", "正是", "徐州", "忽然", "因此", "成都", "不见", "未知", "大败", "大事", "之后", "一军", "引军", "起兵", "军中",
                "接应", "进兵", "大惊", "可以", "以为", "大怒", "不得", "心中", "下文", "一声", "追赶", "粮草", "曹兵", "一齐", "分解", "回报", "分付", "只得",
                "出马", "三千", "大将", "许都", "随后", "报知", "前面", "之兵", "且说", "众官", "洛阳", "领兵", "何人", "星夜", "精兵", "城上", "之计", "不肯",
                "相见", "其言", "一日", "而行", "文武", "襄阳", "准备", "若何", "出战", "亲自", "必有", "此事", "军师", "之中", "伏兵", "祁山", "乘势", "忽见",
                "大笑", "樊城", "兄弟", "首级", "立于", "西川", "朝廷", "三军", "大王", "传令", "当先", "五百", "一彪", "坚守", "此时", "之间", "投降", "五千",
                "埋伏", "长安", "三路", "遣使", "英雄","回见","大将军","是夜","小路","望见","无不","有人","马下","必然","将士","甘宁","下寨","杀出","诸葛","中原",
                "屯兵","邓艾","蛮兵","之意","城下","前来","武士","城外","出迎","本部","两路","一阵","连夜","四面","奔走","交锋","冀州","细作","使者","江南","杀来",
                "人报","而出","心腹","何处","皇叔","众人","当日","吴兵","兴兵","何以","如之奈何","先帝","江夏","前进","国家","城门","杀入","两军","来到","厮杀","两个","拜谢",
                "岂可","慌忙","饮酒","为首","性命","进发","谋士","此言"}
    
    
    # 精确模式,把文本精确的切分开,不存在冗余单词,返回列表类型
    words = jieba.lcut(txt)
    # 构造一个字典,来表达单词和出现频率的对应关系
    counts = {}
    # 逐一从words中取出每一个元素
    for word in words:
        # 已经有这个键的话就把相应的值加1,没有的话就取值为0,再加1
        if len(word) == 1:
            continue
        elif word == "诸葛亮" or word == "孔明曰":
            rword = "孔明"
        elif word == "关公" or word == "云长":
            rword = "关羽"
        elif word == "玄德" or word == "玄德曰":
            rword = "刘备"
        elif word == "孟德" or word == "丞相":
            rword = "曹操"
        else:
            rword = word
        # 如果在里面返回他的次数,如果不在则添加到字典里面并加一
        counts[rword] = counts.get(rword, 0) + 1
    # 删除停用词
    for word in excludes:
        del counts[word]
    # 排序,变成list类型,并使用sort方法
    items = list(counts.items())
    # 对一个列表按照键值对的2个元素的第二个元素进行排序
    # Ture从大到小,结果保存在items中,第一个元素就是出现次数最多的元素
    items.sort(key=lambda x: x[1], reverse=True)
    # 将前十个单词以及出现的次数打印出来
    name = []
    times = []
    for i in range(40):
        word, count = items[i]
        print("{0:<10}{1:>5}".format(word, count))
        name.append(word)
        times.append(count)
    # 词云部分
    w = wordcloud.WordCloud(
        font_path='songti.TTF',  # 设置字体
        background_color="white",  # 设置词云背景颜色
        max_words=1000,  # 词云允许最大词汇数
        max_font_size=100,  # 最大字体大小
        random_state=50, # 配色方案的种数
        mask=mk
    )
    txt = " ".join(name)
    w.generate(txt)
    w.to_file("ciyun.png")
    
    

    个词,并形成词云(可以有不同的形状)

    更多相关内容
  • python——三国演义 制作词云

    千次阅读 2021-05-17 20:02:39
    python——三国演义制作词云 题目: 设计一个程序,读出threekingdoms.txt文件中的三国演义全文,将常见人名进行去重后生成词云,并列出词频最高的5个词。 例:'玄德','刘备','玄德曰','刘皇叔','皇叔'都是同一个...

    python——三国演义 制作词云

    题目:

    设计一个程序,读出threekingdoms.txt文件中的三国演义全文,将常见人名进行去重后生成词云,并列出词频最高的5个词。
    例:'玄德','刘备','玄德曰','刘皇叔','皇叔'都是同一个人。
    可利用字典来保存需要去重的词。
    dupDict={'曹操' : ['孟德','丞相'],
             '玄德' : ['刘备','皇叔','刘皇叔','玄德曰'],
             '云长' : ['关羽','关云长','关公'],
             '孔明' : ['诸葛亮','诸葛','孔明曰'],
             '张飞' : ['翼徳'],
             '赵云' : ['子龙','赵子龙'],
             '周瑜' : ['公瑾','都督']}

    首先:

    下载jieba,wordcloud ,imread

    代码:

    import jieba
    from wordcloud import WordCloud
    from imageio import imread
    # 读文件
    filename='threekingdoms.txt'
    mytext=open(filename,encoding='utf-8').read()
    # 使用结巴分词
    words=jieba.lcut(mytext)
    # 除掉不重要的词
    removes=['将军','二人','却说','次日','主公','不能','不可','罗贯中','上卷','长江','滚滚','逝水']
    words=[word for word in words if word not in removes]

    # 去掉文件中重复名字的词
    ls = []
    for i in words:
        if len(i)==1 :
            continue
        elif i in ['孟德','丞相']:
            ls.append('曹操')
        elif i in ['诸葛亮','诸葛','孔明曰']:
            ls.append('孔明')
        elif i in ['刘备','皇叔','刘皇叔','玄德曰']:
            ls.append('刘备')
        elif i in ['关羽','关云长','关公']:
            ls.append('云长')
        elif i in ['公瑾','都督']:
            ls.append('周瑜')
        elif i in ['子龙','赵子龙']:
            ls.append('赵云')
        elif i in ['翼徳']:
            ls.append('张飞')
        else:
            ls.append(i)
    words=ls

    # 词频统计--字典
    word_count = {}
    for word in words:
        if len(word)>1:
            word_count[word] = word_count.get(word, 0) + 1
    print(sorted(word_count.items(), key = lambda kv:kv[1],reverse=True)[:5])


    # 更换背景图片  设置字体
    mask = imread("PeiJian\Zhangfei.png")
    w=WordCloud(font_path="PeiJian\STXINWEI.TTF",background_color='white',width=1000,height=500,max_words=2000,mask=mask)
    # 必须给个符号分隔开分词结果来形成字符串,否则不能绘制词云
    w.generate(" ".join(words))
    # 最后生成的图片
    w.to_file(r'PeiJian\1.png')

     

     

     

    展开全文
  • open('三国演义.txt', 'r' ,encoding='gb18030').read() remove = {"将军", "却说", "不能", "后主", "上马", "不知", "天子", "大叫", "众将", "不可", "主公", "蜀兵", "只见", "如何", "商议", "都督", "一", ...

    一、安装所需要的第三方库

    jieba (jieba是优秀的中文分词第三分库)

    pyecharts (一个优秀的数据可视化库)

    使用pycharm安装库

    打开Pycharm选择【File】下的Settings

    出现下面页面,

    选择右边的【+】出现下面页面,在此页面顶端搜索想要的库,然后安装就可以了

    二、编写代码

    import jieba #导入库

    import os

    print("人物出现次数前十名:")

    txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()

    words = jieba.lcut(txt)

    counts = {}

    for word in words:

    if len(word) == 1:

    continue

    elif word == "诸葛亮" or word == "孔明曰":

    rword = "孔明"

    elif word == "关公" or word == "云长":

    rword = "关羽"

    elif word == "玄德" or word == "玄德曰":

    rword = "刘备"

    elif word == "孟德" or word == "丞相":

    rword = "曹操" # 把相同意思的名字归为一个人

    else:

    rword = word

    counts[rword] = counts.get(rword, 0) + 1

    items = list(counts.items())

    items.sort(key=lambda x: x[1], reverse=True)

    for i in range(10):

    word, count=items[i]

    print("{}:{}".format(word, count)) # 打印前十名名单

    结果如下图:

    可以看到这里面有很多不是人物的名字,所以咱们要把这些删掉。更改代码如下

    import jieba #导入库

    import os

    print("人物出现次数前十名:")

    txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()

    remove = {"将军", "却说", "不能", "后主", "上马", "不知", "天子", "大叫", "众将", "不可",

    "主公", "蜀兵", "只见", "如何", "商议", "都督", "一人", "汉中", "人马",

    "陛下", "魏兵", "天下", "今日", "左右", "东吴", "于是", "荆州", "不能", "如此",

    "大喜", "引兵", "次日", "军士", "军马","二人","不敢"} # 这些文字是要排出掉的,多次运行程序所得到的

    words = jieba.lcut(txt)

    counts = {}

    for word in words:

    if len(word) == 1:

    continue

    elif word == "诸葛亮" or word == "孔明曰":

    rword = "孔明"

    elif word == "关公" or word == "云长":

    rword = "关羽"

    elif word == "玄德" or word == "玄德曰":

    rword = "刘备"

    elif word == "孟德" or word == "丞相":

    rword = "曹操" # 把相同意思的名字归为一个人

    else:

    rword = word

    counts[rword] = counts.get(rword, 0) + 1

    for word in remove:

    del counts[word] #匹配文字相等就删除

    items = list(counts.items())

    items.sort(key=lambda x: x[1], reverse=True)

    for i in range(10):

    word, count=items[i]

    print("{}:{}".format(word, count)) # 打印前十名名单

    运行结果如下图

    可以看到现在都是人物名称了

    导出数据,代码如下

    import jieba #导入库

    import os

    print("人物出现次数前十名:")

    txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()

    remove = {"将军", "却说", "不能", "后主", "上马", "不知", "天子", "大叫", "众将", "不可",

    "主公", "蜀兵", "只见", "如何", "商议", "都督", "一人", "汉中", "人马",

    "陛下", "魏兵", "天下", "今日", "左右", "东吴", "于是", "荆州", "不能", "如此",

    "大喜", "引兵", "次日", "军士", "军马","二人","不敢"} # 这些文字是要排出掉的,多次运行程序所得到的

    words = jieba.lcut(txt)

    counts = {}

    for word in words:

    if len(word) == 1:

    continue

    elif word == "诸葛亮" or word == "孔明曰":

    rword = "孔明"

    elif word == "关公" or word == "云长":

    rword = "关羽"

    elif word == "玄德" or word == "玄德曰":

    rword = "刘备"

    elif word == "孟德" or word == "丞相":

    rword = "曹操" # 把相同意思的名字归为一个人

    else:

    rword = word

    counts[rword] = counts.get(rword, 0) + 1

    for word in remove:

    del counts[word] #匹配文字相等就删除

    items = list(counts.items())

    items.sort(key=lambda x: x[1], reverse=True)

    #导出数据

    fo = open("三国人物出场次数.txt", "a", encoding='utf-8')

    for i in range(10):

    word, count=items[i]

    word = str(word)

    count = str(count)

    fo.write(word)

    fo.write(':') #使用冒号分开

    fo.write(count)

    fo.write('\n') #换行

    fo.close() #关闭文件

    现在咱们运行看是否导出,运行结果如下图。

    可以看到已经生成一个名为三国人物出场次数.txt的文件,而文件里的内容就是咱们刚才的数据。

    三、数据可视化

    想要可视化首先咱们要有数据,咱们把刚才导出的数据转换为字典形式。代码如下

    #将txt文本里的数据转换为字典形式

    fr = open('三国人物出场次数.txt', 'r', encoding='utf-8')

    dic = {}

    keys = [] # 用来存储读取的顺序

    for line in fr:

    v = line.strip().split(':')

    dic[v[0]] = v[1]

    keys.append(v[0])

    fr.close()

    print(dic)

    -运行结果如下

    使用pyecharts绘图

    先倒入模块

    from pyecharts import options as opts

    from pyecharts.charts import Bar

    代码如下

    # 绘图

    list1=list(dic.keys())

    list2=list(dic.values()) #提取字典里的数据作为绘图数据

    c = (

    Bar()

    .add_xaxis(list1)

    .add_yaxis("人物出场次数",list2)

    .set_global_opts(

    xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),

    )

    .render("人物出场次数可视化图.html")

    )

    运行程序看到目录下会生成一个名为人物出场次数可视化图.html的文件,如下图

    使用浏览器打开,就可以看到数据以图形的方式呈现出来。

    三、全部代码呈现

    #《三国演义》的人物出场次数Python代码:

    import jieba #导入库

    import os

    from pyecharts import options as opts

    from pyecharts.charts import Bar

    print("人物出现次数前十名:")

    txt = open('三国演义.txt', 'r' ,encoding='gb18030').read()

    remove = {"将军", "却说", "不能", "后主", "上马", "不知", "天子", "大叫", "众将", "不可",

    "主公", "蜀兵", "只见", "如何", "商议", "都督", "一人", "汉中", "人马",

    "陛下", "魏兵", "天下", "今日", "左右", "东吴", "于是", "荆州", "不能", "如此",

    "大喜", "引兵", "次日", "军士", "军马","二人","不敢"} # 这些文字是要排出掉的,多次运行程序所得到的

    words = jieba.lcut(txt)

    counts = {}

    for word in words:

    if len(word) == 1:

    continue

    elif word == "诸葛亮" or word == "孔明曰":

    rword = "孔明"

    elif word == "关公" or word == "云长":

    rword = "关羽"

    elif word == "玄德" or word == "玄德曰":

    rword = "刘备"

    elif word == "孟德" or word == "丞相":

    rword = "曹操" # 把相同意思的名字归为一个人

    else:

    rword = word

    counts[rword] = counts.get(rword, 0) + 1

    for word in remove:

    del counts[word] #匹配文字相等就删除

    items = list(counts.items())

    items.sort(key=lambda x: x[1], reverse=True)

    #导出数据

    fo = open("三国人物出场次数.txt", "a", encoding='utf-8')

    for i in range(10):

    word, count=items[i]

    word = str(word)

    count = str(count)

    fo.write(word)

    fo.write(':') #使用冒号分开

    fo.write(count)

    fo.write('\n') #换行

    fo.close() #关闭文件

    #将txt文本里的数据转换为字典形式

    fr = open('三国人物出场次数.txt', 'r',encoding='utf-8' )

    dic = {}

    keys = [] # 用来存储读取的顺序

    for line in fr:

    v = line.strip().split(':')

    dic[v[0]] = v[1]

    keys.append(v[0])

    fr.close()

    print(dic)

    # 绘图

    list1=list(dic.keys())

    list2=list(dic.values()) #提取字典里的数据作为绘图数据

    c = (

    Bar()

    .add_xaxis(list1)

    .add_yaxis("人物出场次数",list2)

    .set_global_opts(

    xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),

    )

    .render("人物出场次数可视化图.html")

    )

    标签:elif,rword,word,python,items,可视化,counts,txt,前十名

    来源: https://www.cnblogs.com/ke-wu-a/p/14026658.html

    展开全文
  • 三国演义人物词频统计-4

    千次阅读 2018-09-08 11:30:46
    题目来源:Python语言程序设计 授课老师: 嵩天、黄天羽、礼欣 ... 三国演义人物词频统计-3:https://blog.csdn.net/Mzjuser/article/details/82527464 ...三国演义人物词频统计-2:https://blog.cs...

    题目来源:Python语言程序设计

    授课老师: 嵩天、黄天羽、礼欣

    hamlet小说下载路径:https://python123.io/resources/pye/threekingdoms.txt

    三国演义人物词频统计-3:https://blog.csdn.net/Mzjuser/article/details/82527464

    三国演义人物词频统计-2:https://blog.csdn.net/Mzjuser/article/details/82527412

    三国演义人物词频统计-1:https://blog.csdn.net/Mzjuser/article/details/82527289


    代码

    import jieba
    path = 'C:\\Users\\Desktop\\三国演义.txt'
    text = open(path,'r',encoding='utf-8').read()
    #使用结巴的函数对文本进行分词
    words = jieba.lcut(text)
    #需要排除一些不是人名的单词
    excludes = ['将军','却说','二人','不可','荆州','不能','如此','商议','如何'
                ,'军士','左右','天下','次日','大喜','引兵','军马','东吴','于是'
                ,'今日','不敢','魏兵','陛下','一人','人马','汉中','不知','只见',
                '众将','蜀兵','上马','大叫']
    #定义字典类型去存储文字和文字出现的次数
    counts = {}
    for word in words:
        if len(word) == 1:
            continue
        elif word == '诸葛亮'or word == '孔明曰':
            rword = '孔明'
        elif word == '玄德'or word == '玄德曰' or word == '主公':
            rword = '刘备'
        elif word == '孟德'or word == '丞相':
            rword = '曹操'
        elif word == '关公'or word == '云长':
            rword = '关羽'
        elif word == '都督':
            rword = '周瑜'
        elif word == '后主':
            rword = '刘禅'
        elif word == '太守':
            rword = '刘度'
        else:
            rword = word
        counts[rword] = counts.get(rword,0) + 1
    #把一些不是人名的词语排除掉
    for word in excludes:
        del counts[word]
    items = list(counts.items())
    #根据iems的第二个值进行从大到小的排序
    items.sort(key = lambda x:x[1],reverse=True)
    for i in range(15):
        word,count = items[i]
        #左对齐,占位10位,填充字符为空格
        print("{0:<10}{1:>5}".format(word,count))
    

    结果显示


    其他解决方案

    可以通过人物的名称(需要对三国中的人物有详细的了解)对人物出现的次数进行统计,然后在进行排序。

    展开全文
  • == 1: continue elif word == "诸葛亮" or word == "孔明曰": # 合并同一个名词 r_word = "孔明" elif word == "关公" or word == "长": r_word = "关羽" elif word == "玄德" or word == "玄德曰": r_word = ...
  • “《三国演义》词云”是近期归纳学习心得期间一时兴起做来练手的,水平极其有限,仅作记录。 自学Python强推北京理工大学嵩天教授的MOOC:Python语言程序设计;课件的深度设置地很舒服,非常适合零基础入门或者有...
  • 三国演义词云

    2020-12-19 10:37:23
  • 一个小问题却能极大挫伤学习热情,愣是两天不想碰。 在学习中国大学mooc上嵩天老师的《python语言程序设计》,在第六周的实例文本词频统计中遇到问题,按照老师的代码在mac上...**txt = open("三国演义.txt", "r"...
  • 'D:/三国演义.txt' , 'r' , encoding = 'utf-8' ) . read ( ) words = jieba . lcut ( text ) counts = { } for word in words : if len ( word ) == 1 : continue else : counts [ ...
  • 二、名词解释: 1、元诗四大家 2、“铁崖体” 三、简答题: 无 四、论述题 无 第十一章 《三国志演义》与历史演义的繁荣 一、填空题: 1、( )是我国第一部长篇章回小说,也是历史演义小说的开山之作。 2、在长期的...
  • 大数据文摘出品来源:Qiita编译:李欣月、刘俊寰作为中国四大名著之一,三国的故事自然备受国人喜爱和追捧,但是谁又能想到三国竟然在日本也“出了圈”,举个例子,吴宇森导演的...
  • 大数据文摘出品来源:Qiita编译:李欣月、刘俊寰作为中国四大名著之一,三国的故事自然备受国人喜爱和追捧,但是谁又能想到三国竟然在日本也“出了圈”,举个例子,吴宇森导演的电影《赤壁》在日本的票房收入超过...
  • python第七章

    2022-05-14 16:56:54
    #增加一个停用词的集合excludes excludes = {"将军","却说","荆州","二","不可","不能","如此"} txt = open("F:\\code\\编程\\py\\学习代码\\savetext\\三国演义.txt", "r", encoding='utf-8').read() words = ...
  • 艺术概论

    万次阅读 2021-02-02 18:17:44
    艺术作品被创作出来,是为了供人们阅读或欣赏,如果没欣赏,它就还只是潜在的作品*” 艺术生产必须适应欣赏者的消费需要,艺术欣赏反过来又成为刺激艺术生产的动力,推动着艺术生产的发展 20世纪新批评派、结构...
  • 第二版 import jieba txt =open('D:/pythonfiles/三国演义.txt','r',encoding='utf-8').read() excludes = {'将军','却说','荆州','二','不可','不能','如此','如何','军士','商议','左右','军马','次日','引兵','...
  • 春节灯谜及答案

    千次阅读 2019-08-08 06:19:54
    字谜—— 饭(打一字)。 糙 稻(打一字)。 类 武(打一字)。...乍得(打一字)。...内里有(打一字)。...一背张弓(打一字)。...陕西十分好(打一字)。... 乘不备 ...在楼头空伫立(打安徽一地名)。...南不复...
  • 总是有一些固有的心理偏差。你的经验和技术就是来源于别人的错误。 应该用科学的系统的方法,刻意练习。 量化的起源、基石、优势。 起源:均值回复策略。法郎永续国债,中位数73法郎,价格在30-80...
  • 锚点定位及案例

    千次阅读 2019-01-26 10:47:59
    三国演义三国演义三国演义》是中国第一部长篇章回体小说。《三国演义》故事开始黄巾兵起义,结束于司马氏灭吴开晋,以描写战争为主,反映了魏、蜀汉、吴三个政治集团之间的政治和军事斗争,展现了从东汉末年到...
  • 不好好学习的人才看的,比如三国、水浒、西厢记、金瓶梅、金瓶梅、还有金瓶梅啦。藏传佛教有因明学。因就是因果,明就是说明白逻辑关系。古希腊就不用说了,整个社会制度建立在逻辑辩论之上。我们小学时学过一个为了...
  • encoding="utf-8").read() excludes = {"将军", "却说", "荆州", "二", "不可", "不能", "如此"} # 出去非人名词 words = jieba.lcut(txt) counts = {} for word in words: if len(word) == 1: continue elif word...
  • 2、文本词频统计实战(《三国演义》词频统计、人物统计) 首先我们来了解一下中文分词的特点和难点: 【中文分词介绍】 【中文分词特点】 词是最小的能够独立活动的有意义的语言成分 汉语是以字为单位,不像...
  • "成都","徐州", "因此","未知","大败","百姓","大事","一军", "起兵","之后","接应","不见","进兵","可以", "引军","军中","大怒"} txt=open("E:\三国演义.txt","r",encoding="utf-8").read()#读取文件《三国演义》 ...
  • 思科九年

    万次阅读 2020-07-10 09:06:58
    在这里上班的员工也多是二三十岁的年轻,有几个日本被安插在各个部门作为外资方的管理人员。 1998年,我27岁,刚刚完成了婚房的装修,计划和女友次年结婚。 1998年8月,我向早已预料到我要走的丁老板递交了辞呈...
  • 三国

    2009-01-15 06:05:03
    想在线听三国演义评书(1-365集)请点这里  这是一个三足鼎立的舞台,这里曾经走过一批个性张扬的英雄。然而,这又是一段被演义笼罩的历史。三国,究竟是英雄的传奇,还是智者的比拼?穿透演义迷雾,还原历史真实...

空空如也

空空如也

1 2 3 4 5
收藏数 88
精华内容 35
关键字:

三国演义人名词云

友情链接: oracletohsqldb.rar