精华内容
下载资源
问答
  • python去除标点符号

    2020-08-05 15:02:29
    def removePunctuation(query): # 去除标点符号(只留字母、数字、中文) if query: rule = re.compile(u"[^a-zA-Z0-9\u4e00-\u9fa5]") query = rule.sub(’’, query) return query

    def removePunctuation(query):
    # 去除标点符号(只留字母、数字、中文)
    if query:
    rule = re.compile(u"[^a-zA-Z0-9\u4e00-\u9fa5]")
    query = rule.sub(’’, query)
    return query

    展开全文
  • 最近在处理文本,发现切分句子,去除标点符号,判断字符串是否包含中文经常会用到,我这里分享一下我的代码: 切分句子 import re def split_sentences(line): line_split = re.split(r'[。!;?,]',line.strip...

    最近在处理文本,发现切分句子,去除标点符号,判断字符串是否包含中文经常会用到,我这里分享一下我的代码:

    • 切分句子
    import re
    def split_sentences(line):
        line_split = re.split(r'[。!;?,]',line.strip())
        line_split = [line.strip() for line in line_split if line.strip() not in ['。','!','?',';',','] and len(line.strip())>1]
        return line_split
    • 判断字符串是否包含中文
    def is_contain_chinese(check_str):
        """
        判断字符串中是否包含中文
        :param check_str: {str} 需要检测的字符串
        :return: {bool} 包含返回True, 不包含返回False
        """
        for ch in check_str:
            if u'\u4e00' <= ch <= u'\u9fff':
                return True
        return False
    
    • 去除标点符号
    def remove_punctuation(line):
        rule = re.compile(r"[^a-zA-Z0-9\u4e00-\u9fa5]")
        line = rule.sub('',line)
        return line

    参考文献

    [1].Python处理中文标点符号大集合. https://www.jb51.net/article/140055.htm

    [2].Python编程:判断字符串中是否包含中文. https://blog.csdn.net/mouday/article/details/81512870

    展开全文
  • I'm trying to remove a list of punctuation from my text file but I have only one problem with words separated from hyphen. For example, if I have the word "post-trauma" I get "posttrama" conversely I ...

    I'm trying to remove a list of punctuation from my text file but I have only one problem with words separated from hyphen. For example, if I have the word "post-trauma" I get "posttrama" conversely I want to get "post" "trauma".

    My code is:

    punct=['!', '#', '"', '%', '$', '&', ')', '(', '+', '*', '-']

    with open(myFile, "r") as f:

    text= f.read()

    remove = '|'.join(REMOVE_LIST) #list of word to remove

    regex = re.compile(r'('+remove+r')', flags=re.IGNORECASE)

    out = regex.sub("", text)

    delta= " ".join(out.split())

    txt = "".join(c for c in delta if c not in punct )

    Is there a way to solve it?

    解决方案

    I believe you can just call the built-in replace function on delta, so your last line would become the following:

    txt = "".join(c for c in delta.replace("-", " ") if c not in punct )

    This means all the hyphens in your text will become spaces, so the words will be treated as if they were separate.

    展开全文
  • python去除标点符号的方法发布时间:2020-08-25 10:33:44来源:亿速云阅读:181作者:小新这篇文章将为大家详细讲解有关python去除标点符号的方法,小编觉得挺实用的,因此分享给大家做个参考,希望大家阅读完这...

    python中去除标点符号的方法

    发布时间:2020-08-25 10:33:44

    来源:亿速云

    阅读:181

    作者:小新

    这篇文章将为大家详细讲解有关python中去除标点符号的方法,小编觉得挺实用的,因此分享给大家做个参考,希望大家阅读完这篇文章后可以有所收获。

    Python去掉标点符号的方法如下:

    方法一:

    str.isalnum:

    S.isalnum() -> bool

    返回值:如果string至少有一个字符并且所有字符都是字母或数字则返回True,否则返回False。

    实例:>>> string = "Special $#! characters spaces 888323"

    >>> ''.join(e for e in string if e.isalnum())

    'Specialcharactersspaces888323'

    只能识别字母和数字,杀伤力大,会把中文、空格之类的也干掉

    方法二:

    string.punctuationimport re, string

    s ="string. With. Punctuation?" # Sample string

    # 写法一:

    out = s.translate(string.maketrans("",""), string.punctuation)

    # 写法二:

    out = s.translate(None, string.punctuation)

    # 写法三:

    exclude = set(string.punctuation)

    out = ''.join(ch for ch in s if ch not in exclude)

    # 写法四:

    >>> for c in string.punctuation:

    s = s.replace(c,"")

    >>> s

    'string With Punctuation'

    # 写法五:

    out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

    ## re.escape:对字符串中所有可能被解释为正则运算符的字符进行转义

    # 写法六:

    # string.punctuation 只包括 ascii 格式; 想要一个包含更广(但是更慢)的方法是使用: unicodedata module :

    from unicodedata import category

    s = u'String — with - «Punctuation »...'

    out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

    print 'Stripped', out

    # 输出:u'Stripped String \u2014 with \xabPunctuation \xbb'

    out = ''.join(ch for ch in s if category(ch)[0] != 'P')

    print 'Stripped', out

    # 输出:u'Stripped String with Punctuation '

    # For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

    # To remove (some?) punctuation then, use:

    import string

    remove_punct_map = dict.fromkeys(map(ord, string.punctuation))

    s.translate(remove_punct_map)

    # Your method doesn't work in Python 3, as the translate method doesn't accept the second argument any more.

    import unicodedata

    import sys

    tbl = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P'))

    def remove_punctuation(text):

    return text.translate(tbl)

    方法三:

    re

    例:import re

    s ="string. With. Punctuation?"

    s = re.sub(r'[^\w\s]','',s)

    测试:import re, string, timeit

    s ="string. With. Punctuation"

    exclude = set(string.punctuation)

    table = string.maketrans("","")

    regex = re.compile('[%s]' % re.escape(string.punctuation))

    def test_set(s):

    return ''.join(ch for ch in s if ch not in exclude)

    def test_re(s):

    return regex.sub('', s)

    def test_trans(s):

    return s.translate(table, string.punctuation)

    def test_repl(s):

    for c in string.punctuation:

    s=s.replace(c,"")

    return s

    print"sets :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)

    print"regex :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)

    print"translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)

    print"replace :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

    out_put:

    # sets : 19.8566138744

    # regex : 6.86155414581

    # translate : 2.12455511093

    # replace : 28.4436721802

    关于python中去除标点符号的方法就分享到这里了,希望以上内容可以对大家有一定的帮助,可以学到更多知识。如果觉得文章不错,可以把它分享出去让更多的人看到。

    展开全文
  • Python去除文本所有标点符号

    万次阅读 2019-09-19 16:48:17
    去除标点符号方式多种多样,这里介绍两种自己常用的。 1、python自带punctuation包,可以消除所有中文标点符号。 import re,string from zhon.hanzi import punctuation text = " Hello, world! 这,是:我;第!一...
  • 既然是去掉标点符号,那当然是用正则表达式啦。正则表达式,又称规则表达式。(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。正则表达式通常被用来检索、替换那些符合...
  • 如果速度不是一个担心,但另一个选择,虽然是: exclude = set(string.punctuation) s = ''.join(ch for ch in s if ch not in exclude)这比使用每个字符的s.replace更快,但不会像非正式的Python方法(如正则表达式...
  • python ——使用正则化去除标点符号

    千次阅读 2020-10-12 14:53:43
    python ——使用正则化去除标点符号 在进行文本预处理时,可以使用正则化去掉文本中的标点符号。 re.sub(pattern, repl, string, count=0, flags=0) 去除掉一般符号代码如下: r = "[A-Za-z0-9_.!+-=——,$%^,。...
  • python去除文本标点符号

    千次阅读 2019-11-01 18:10:30
         ...为了消除标点符号的影响,需要去除标点python的string模块下的punctuation包含所有的英文标点符号。所以用replace()一下就可以去除: Example 1: import stri...
  • python 去除所有的中文 英文标点符号

    千次阅读 2020-04-08 15:46:45
    python的string模块下的 punctuation 包含所有的英文标点符号,所以用replace()一下就可以去除。 代码示例: import string stri = 'today is friday, so happy..!!!' punctuation_string = string.punctuation ...
  • 导入python string类自带的标点符号 from string import punctuation s='不错!今天,也要"加油"哦?' dicts={i:'' for i in punctuation} punc_table=str.maketrans(dicts) new_s=s.translate(punc_table) print(...
  • 广告关闭腾讯云11.11云上盛惠 ,精选热门产品助力上云,云服务器首年88元起,买的越多返的越多,最高返5000元!腾讯云 api 全新升级3.0 ,该... 这里针对 python api 调用方式进行简单说明。 现已支持云服务器(cv...
  • python 字符串过滤英文标点符号例如 s = """ this is a example, and i want to miss punctuation. ..python中用正则表达式去掉文本中所有的标点符号目前的做法是: line=re.sub(r'[{}]+'.format(punctuation),'',...
  • 中文文本中可能出现的标点符号来源比较复杂,通过匹配等手段对他们处理的时候需要格外小心,防止遗漏。以下为在下处理中文标点的时候采用的两种方法:中文标点集合比较常见标点有这些:!?。"#$%&'()*+,...
  • 文本去标点 """ punctuation = r"~!@#$%^&*()_+`{}|\[\]\:\";\-\\\='<>?,./,。、《》?;:‘“{【】}|、!@#¥%……&*()——+=-" content = re.sub(r'[{}]+'.format(punctuati
  • import re def remove(text): remove_chars = '[0-9’!"#$%&\'()*+,-./:;<=>?@,。?★、…【】《》?“”‘’![\\]^_`{|}~]+' return re.sub(remove_chars, '', text)

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 2,902
精华内容 1,160
关键字:

python去除标点符号

python 订阅