精华内容
下载资源
问答
  • I am trying to split a comma delimited string in python. The tricky part for me here is that some of the fields in the data themselves have a comma in them and they are enclosed within quotes (" or ')...

    I am trying to split a comma delimited string in python. The tricky part for me here is that some of the fields in the data themselves have a comma in them and they are enclosed within quotes (" or '). The resulting split string should also have the quotes around the fields removed. Also, some fields can be empty.

    Example:

    hey,hello,,"hello,world",'hey,world'

    needs to be split into 5 parts like below

    ['hey', 'hello', '', 'hello,world', 'hey,world']

    Any ideas/thoughts/suggestions/help with how to go about solving the above problem in Python would be much appreciated.

    Thank You,

    Vish

    解决方案

    (Edit: The original answer had trouble with empty fields on the edges due to the way re.findall works, so I refactored it a bit and added tests.)

    import re

    def parse_fields(text):

    r"""

    >>> list(parse_fields('hey,hello,,"hello,world",\'hey,world\''))

    ['hey', 'hello', '', 'hello,world', 'hey,world']

    >>> list(parse_fields('hey,hello,,"hello,world",\'hey,world\','))

    ['hey', 'hello', '', 'hello,world', 'hey,world', '']

    >>> list(parse_fields(',hey,hello,,"hello,world",\'hey,world\','))

    ['', 'hey', 'hello', '', 'hello,world', 'hey,world', '']

    >>> list(parse_fields(''))

    ['']

    >>> list(parse_fields(','))

    ['', '']

    >>> list(parse_fields('testing,quotes not at "the" beginning \'of\' the,string'))

    ['testing', 'quotes not at "the" beginning \'of\' the', 'string']

    >>> list(parse_fields('testing,"unterminated quotes'))

    ['testing', '"unterminated quotes']

    """

    pos = 0

    exp = re.compile(r"""(['"]?)(.*?)\1(,|$)""")

    while True:

    m = exp.search(text, pos)

    result = m.group(2)

    separator = m.group(3)

    yield result

    if not separator:

    break

    pos = m.end(0)

    if __name__ == "__main__":

    import doctest

    doctest.testmod()

    (['"]?) matches an optional single- or double-quote.

    (.*?) matches the string itself. This is a non-greedy match, to match as much as necessary without eating the whole string. This is assigned to result, and it's what we actually yield as a result.

    \1 is a backreference, to match the same single- or double-quote we matched earlier (if any).

    (,|$) matches the comma separating each entry, or the end of the line. This is assigned to separator.

    If separator is false (eg. empty), that means there's no separator, so we're at the end of the string--we're done. Otherwise, we update the new start position based on where the regex finished (m.end(0)), and continue the loop.

    展开全文
  • 如果你有:import pandasdata = pandas.DataFrame({"composers": ["Joseph Haydn","Wolfgang Amadeus Mozart","Antonio Salieri","Eumir Deodato"]})假设您只想要名字(而不是像Amadeus这样的中间名):data.composers...

    如果你有:

    import pandas

    data = pandas.DataFrame({"composers": [

    "Joseph Haydn",

    "Wolfgang Amadeus Mozart",

    "Antonio Salieri",

    "Eumir Deodato"]})

    假设您只想要名字(而不是像Amadeus这样的中间名):

    data.composers.str.split('\s+').str[0]

    会给:

    0 Joseph

    1 Wolfgang

    2 Antonio

    3 Eumir

    dtype: object

    您可以将其分配给同一数据框中的新列:

    data['firstnames'] = data.composers.str.split('\s+').str[0]

    姓氏将是:

    data.composers.str.split('\s+').str[-1]

    这使:

    0 Haydn

    1 Mozart

    2 Salieri

    3 Deodato

    dtype: object

    对于除姓氏之外的所有名称,您可以将“.join(..)”应用于除每行的最后一个元素([:-1])之外的所有元素:

    data.composers.str.split('\s+').str[:-1].apply(lambda parts: " ".join(parts))

    这使:

    0 Joseph

    1 Wolfgang Amadeus

    2 Antonio

    3 Eumir

    dtype: object

    展开全文
  • 数据: 11,"American President, The (1995)",Comedy|Drama|Romance 分割后理想情况: 11 "American President, The (1995)" Comedy|Drama|Romance 实际单纯用split分割后 11 "American President ...

    数据:

    11,"American President, The (1995)",Comedy|Drama|Romance
    

    分割后理想情况:

    11
    "American President, The (1995)"
    Comedy|Drama|Romance
    

    实际单纯用split分割后

    11
    "American President
    The (1995)"
    Comedy|Drama|Romance
    

    怎么解决引号内不分割?
    方法如下:

        fp = open(item_file, encoding='UTF-8')
        for line in fp:
            lex = shlex.shlex(line)
            lex.whitespace=','
            lex.quotes='"'
            lex.whitespace_split = True
            itemlist=list(lex)
            if len(itemlist) < 3:
                continue
            [itemid, title, genres] = itemlist[0],itemlist[1],itemlist[2]
    
    ['11','"American President, The (1995)"', 'Comedy|Drama|Romance\n']
    
    展开全文
  • 如果有递归嵌套表达式,可以在逗号拆分,并验证它们是否与pyparsing匹配:import pyparsing as ppdef CommaSplit(txt):''' Replicate the function of str.split(',') but do not split on nested expressions or ...

    如果有递归嵌套表达式,可以在逗号上拆分,并验证它们是否与pyparsing匹配:import pyparsing as pp

    def CommaSplit(txt):

    ''' Replicate the function of str.split(',') but do not split on nested expressions or in quoted strings'''

    com_lok=[]

    comma = pp.Suppress(',')

    # note the location of each comma outside an ignored expression:

    comma.setParseAction(lambda s, lok, toks: com_lok.append(lok))

    ident = pp.Word(pp.alphas+"_", pp.alphanums+"_") # python identifier

    ex1=(ident+pp.nestedExpr(opener='')) # Ignore everthing inside nested '< >'

    ex2=(ident+pp.nestedExpr()) # Ignore everthing inside nested '( )'

    ex3=pp.Regex(r'("|\').*?\1') # Ignore everything inside "'" or '"'

    atom = ex1 | ex2 | ex3 | comma

    expr = pp.OneOrMore(atom) + pp.ZeroOrMore(comma + atom )

    try:

    result=expr.parseString(txt)

    except pp.ParseException:

    return [txt]

    else:

    return [txt[st:end] for st,end in zip([0]+[e+1 for e in com_lok],com_lok+[len(txt)])]

    tests='''\

    obj<1, 2, 3>, x(4, 5), "msg, with comma"

    nesteobj<1, sub<6, 7>, 3>, nestedx(4, y(8, 9), 5), "msg, with comma"

    nestedobj<1, sub<6, 7>, 3>, nestedx(4, y(8, 9), 5), 'msg, with comma', additional<1, sub<6, 7>, 3>

    bare_comma<1, sub(6, 7), 3>, x(4, y(8, 9), 5), , 'msg, with comma', obj<1, sub<6, 7>, 3>

    bad_close<1, sub<6, 7>, 3), x(4, y(8, 9), 5), 'msg, with comma', obj<1, sub<6, 7>, 3)

    '''

    for te in tests.splitlines():

    result=CommaSplit(te)

    print(te,'==>\n\t',result)

    印刷品:

    ^{pr2}$

    当前行为就像'(something does not split), b, "in quotes", c'.split','),包括保留前导空格和引号。从字段中去掉引号和前导空格很简单。在

    将try下的else更改为:else:

    rtr = [txt[st:end] for st,end in zip([0]+[e+1 for e in com_lok],com_lok+[len(txt)])]

    if strip_fields:

    rtr=[e.strip().strip('\'"') for e in rtr]

    return rtr

    展开全文
  • 在本文中,我们将讨论如何在Python拆分字符串。.split()方法在Python中,字符串表示为不可变的str对象。 str类带有许多字符串方法,允许您操作字符串。.split()方法返回由分隔符分隔的子字符串列表。 它采用以下...
  • Python-以逗号分割字符串且忽略引号中的逗号 要处理的问题 我们在读入txt、csv等数据时,经常会需要根据列名将读入的字符串进行分割。比如有如下的一个字符串存放在csv格式的文件中。 Q9UI32,“Glutaminase liver...
  • 如果第一部分不匹配(即没有用引号括起来的字符串),第二部分([^\r\n\t\f ,]+)匹配所有不是空格或逗号的内容。所以它将忽略您的分隔符,但匹配所有其他内容。在import rerows = [""" 5,'THISMORE"THAN4','/,',4.....
  • 我想使用PyQtGraph绘制数据,所以我试图将从传感器读取的字符串(288个逗号分隔的值)转换为Numpy数组。不过,我尝试了几种不同的方法,但都没有成功。从传感器读取此代码非常有用:#Read the line of data from the ...
  • 如何在Python拆分字符串

    千次阅读 2019-07-24 21:07:58
    在本文中,我们将讨论如何在Python拆分字符串PYthon学习企鹅裙:88198-2657 领取python自动化编程资料教程 .split()方法 在Python中,字符串表示为不可变的str对象。 str类带有许多字符串方法,允许您...
  • Python拆分字符串的方法

    千次阅读 2021-01-14 00:14:52
    在本文中,我们将讨论如何在Python拆分字符串,安装Python参考在CentOS 7/Ubuntu 16.04/Debian 9/macOS上安装Python 3.6的方法。.split()方法在Python中,字符串表示为不可变的str对象,str类带有许多字符串方法,...
  • 我想使用PyQtGraph绘制数据,因此尝试将从传感器读取的字符串(288个逗号分隔的值)转换为Numpy数组。但是,我尝试了几种不同的方法,但还没有奏效。使用此代码从传感器读取效果很好:#Read the line of data from the...
  • 逗号分隔的字符串转换为Python中的列表 给定一个字符串: 它是由逗号分隔的几个值的序列: mStr = '192.168.1.1,192.168.1.2,192.168.1.3' 如何将字符串转换为列表? mStr = ['192.168.1.1', '192.168.1.2'...
  • python拆分列表字符串并创建字典

    千次阅读 2021-01-14 23:08:56
    我有这个清单:lst= ['1 5','1 12','1 55','2 95', '2 66', '3 45',...]正如您所看到的,每个项目由2个数字组成,第二个数字最多为4个字符,并且它们空格分隔.我想把它转移到这样的字典中dct={1:{'doc0005','doc0012'...
  • 我希望我的python函数分割一个句子(输入)并将每个单词存储在一个列表中。我当前的代码将句子拆分,但不将单词存储为列表。我该怎么做?12345678910def split_line(text):# split the textwords = text.split()# for ...
  • 参见英文答案 >How to convert a string list into an integer in python 6个我想将整数列表存储为模型中的字段.由于Django默认没有为此提供字段,因此我使用名为x的...在我看来,然后我接受这个字符串并从中创建一个...
  • a = "dog, cat; cat, dog" b1 = a.split(' ') b2 = a.split('... cat', ' dog'] # ['dog, cat', ' cat, dog'] 猜你喜欢: ⭐【Python】list转str ⭐【Python】字符转换为 ASCII 码 ⭐【Python】判断字符串 str 是否为空
  • 函数:split()Python中有split()和os.path.split()两个函数,具体作用如下:split():拆分字符串.通过指定分隔符对字符串进行切片,并返回分割后的字符串列表(list)os.path.split():按照路径将文件名和路径分割开一.函数...
  • 我猜你用逗号是指反逗号==引号。 然后用这个strs = "Hello (Test1 test2) (Hello1 hello2) other_stuff" 你应该明白["Hello (Test1 test2) (Hello1 hello2) other_stuff"] 因为所有的东西都被反逗号包围。最有可能...
  • C++读取文件并按逗号拆分字符串 #include <iostream> #include <fstream> //读取txt文件中的路径数据 bool getPathData(char *pathname) { std::ifstream openfile; //打开文件 openfile.open...
  • 将一个字符串中前面没用到空白字符串或者后面没用到的空白字符删掉,只保留中间有内容的部分 二去除空白字符的方法 去除空白字符所用方法有strip,lstrip,rstrip(strip有去除的意思) 使用格式: 字符串.strip():...
  • 4.1、字符串基本操作所有标准序列操作(索引、切片、乘法、成员资格检查、长度、最大值、最小值)都适用于字符串字符串是不可变的,因此所有的元素赋值和切片赋值都是非法的。4.2、设置字符串的格式1、使用字符串...
  • 我已经看过多个与此类似的堆栈溢出...我有一个由以下字符串组成的列lineup0 'FLEX Nick Mullens FLEX Raheem Mostert FLEX Deebo Samuel FLEX Cole Beasley FLEX Jordan Reed CPT Robbie Gould'1 'FLEX Nick Mull...
  • python字符串按照逗号分割为list

    千次阅读 2020-08-05 17:19:05
    本身是一个长字符串字符串 执行后成为list b[3]['group_wxindex'][0]['wxindex_str'].split(',')
  • 但是由于字符串类型太过于常用,Python 中提供了非常多的关于字符串的操作。而我们在实际编码过程中,又经常会与字符串打交道。所以这里字符串单独列出一节来讲解。字符串是由一对引号(单双引号都可以)括起来进行...
  • 本篇文章给大家带来的内容是关于Python针对任意多的分隔符拆分字符串(附代码),有一定的参考价值,有需要的朋友可以参考一下,希望对你有所帮助。1、需求我们需要将字符串拆分为不同的字段,但是分隔符(以及分隔...
  • Python中字符串分割的常用方法是直接调用字符串的str.split方法,但是其只能指定一种分隔符,如果想指定多个分隔符拆分字符串需要用到re.split方法(正则表达式的split方法)。str.split字符串的split方法函数原型如下...
  • 原标题:奇技淫巧 - Python分割字符串的5个示例在这个Python教程中,我们将学习Python split字符串函数。与len不同,有些函数是特定于字符串的。要使用字符串函数,输入字符串的名称、dot、函数的名称和函数需要的 ...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 5,518
精华内容 2,207
关键字:

python以逗号拆分字符串

python 订阅