精华内容
下载资源
问答
  • python 模糊匹配

    千次阅读 2017-09-25 16:21:55
    python 模糊匹配
    import re
    
    def fuzzyfinder(input, collection, accessor=lambda x: x):
        """
        Args:
            input (str): A partial string which is typically entered by a user.
            collection (iterable): A collection of strings which will be filtered
                                   based on the `input`.
        Returns:
            suggestions (generator): A generator object that produces a list of
                suggestions narrowed down from `collection` using the `input`.
        """
        suggestions = []
        input = str(input) if not isinstance(input, str) else input
        pat = '.*?'.join(map(re.escape, input))
        regex = re.compile(pat)
        for item in collection:
            r = regex.search(accessor(item))
            if r:
                suggestions.append((len(r.group()), r.start(), accessor(item), item))
    
        return (z[-1] for z in sorted(suggestions))

    展开全文
  • Python 模糊匹配:glob, re, fnmatch

    万次阅读 2013-11-28 10:32:35
    Python 模糊匹配:glob, re, fnmatch
    '''
    fnmatch模块: 提供对Unix Shell通配符的支持
    Pattern Meaning 
    *       matches everything 
    ?       matches any single character 
    [seq]   matches any character in seq 
    [!seq]  matches any character not in seq 

    '''

    import os
    import fnmatch
    for file in os.listdir('.'):
        if fnmatch.fnmatch(file, '*.py'):
            print file

    '''
    glob模块: 查找所有满足Unix Shell模式规则的路径名
    '''

    import os
    import glob
    for f in glob.glob(os.path.join(os.path.abspath('.'), '*')):
        print f


    Python的正则表达式类似于Perl语言。


    re正则表达式使用'\'进行转义, 而Python语言也使用'\'在字符串的转义;因此,为了匹配'\', 必须使用'\\\\'作为模式。
    因为正则表达式必须是\\,而在字符串中每一个\需要两个\\表示。


    对于正则表达式模式,我们可以使用原始字符串(raw string)。原始字符串中的\只作为普通字符串处理。因此,r'\n'表示两个
    字符,'\'和'n',但是在普通字符串中, '\n'将作为换行符处理。


    re中的特殊字符:
    '.' 匹配任意字符, 除了换行。如果 DOTALL标记打开,将匹配所有字符,包括换行。
    '^' 匹配字符串的开始。
    '$' 匹配字符串的结束。


    '*' 0或多次重复匹配。
    '+' 1或多次重复匹配。
    '?' 0次或1次匹配。
    *?, +?, ?? 对应于'*', '+', '?'的非贪心匹配。
    {m} m次重复匹配
    {m, n} m到n次重复匹配,省略m表示m = 0, 省略n表示n无穷大。
    {m, n}? 与*?, +?, ??类似, 非贪心匹配。
    []  匹配字符集。
    |   A|B,表示匹配A或B。
    ()     正则表达中组的概念。


    \d  匹配十进制数
    \D  匹配非非数字字符
    \s  匹配空白
    \S  匹配非空白
    \w  匹配任意数字和字母
    \W  匹配任意非数字和字母


    url = 'http://www.contoso.com:8080/letters/readme.html'
    obj = re.match(r'(.*)//(.*):(\d+)(.*)', url)
    print obj.groups()
    
    lstStr = ['local 127.0.0.1', 'Lucy 192.168.130.2', 'Link 192.168.130.224']
    for s in lstStr:
        obj = re.match(r'.*?(\d+.\d+.\d+.\d+).*?', s)
        print obj.groups()
    


    展开全文
  • 工作需要写了一个python小函数,用fuzzywuzzy模糊匹配技巧解决人工数据和标准数据的匹配问题。基本原理是先精确匹配,如果没有,采用模糊匹配遍历目标空间,选取打分最大的提交用户检查,最后输出结果。 分享给大家...

    生物信息处理一些经过人工输入的数据,往往有少量的错误。但是这些元数据往往要与结果数据统一分析,且非常之重要。数据量小,那就一个一个改吧,如果太多了,杯具了, 怎么办?

    工作需要写了一个python小函数,用fuzzywuzzy模糊匹配技巧解决人工数据和标准数据的匹配问题。基本原理是先精确匹配,如果没有,采用模糊匹配遍历目标空间,选取打分最大的提交用户检查,最后输出结果。

    分享给大家,也许有用。

    fuzzywuzzy: fuzzywuzzy 0.16.0

    这里用了fuzzy.ratio,大家也可以根据需要选其它的,参看文档。
    在这里插入图片描述在这里插入图片描述

    展开全文
  • fuzzywyzzy 是python下一个模糊匹配的模块。首先要安装fuzzywuzzy 示例: 1 from fuzzywuzzy import fuzz 2 from fuzzywuzzy import process 3 4 state_to_code = {"VERMONT": "VT", "GEORGIA": "GA", ...

    fuzzywyzzy 是python下一个模糊匹配的模块。首先要安装fuzzywuzzy

    示例:

     1 from fuzzywuzzy import fuzz
     2 from fuzzywuzzy import process
     3 
     4 state_to_code = {"VERMONT": "VT", "GEORGIA": "GA", "IOWA": "IA", "Armed Forces Pacific": "AP", "GUAM": "GU",
     5                  "KANSAS": "KS", "FLORIDA": "FL", "AMERICAN SAMOA": "AS", "NORTH CAROLINA": "NC", "HAWAII": "HI",
     6                  "NEW YORK": "NY", "CALIFORNIA": "CA", "ALABAMA": "AL", "IDAHO": "ID",
     7                  "FEDERATED STATES OF MICRONESIA": "FM",
     8                  "Armed Forces Americas": "AA", "DELAWARE": "DE", "ALASKA": "AK", "ILLINOIS": "IL",
     9                  "Armed Forces Africa": "AE", "SOUTH DAKOTA": "SD", "CONNECTICUT": "CT", "MONTANA": "MT",
    10                  "MASSACHUSETTS": "MA",
    11                  "PUERTO RICO": "PR", "Armed Forces Canada": "AE", "NEW HAMPSHIRE": "NH", "MARYLAND": "MD",
    12                  "NEW MEXICO": "NM",
    13                  "MISSISSIPPI": "MS", "TENNESSEE": "TN", "PALAU": "PW", "COLORADO": "CO",
    14                  "Armed Forces Middle East": "AE",
    15                  "NEW JERSEY": "NJ", "UTAH": "UT", "MICHIGAN": "MI", "WEST VIRGINIA": "WV", "WASHINGTON": "WA",
    16                  "MINNESOTA": "MN", "OREGON": "OR", "VIRGINIA": "VA", "VIRGIN ISLANDS": "VI", "MARSHALL ISLANDS": "MH",
    17                  "WYOMING": "WY", "OHIO": "OH", "SOUTH CAROLINA": "SC", "INDIANA": "IN", "NEVADA": "NV",
    18                  "LOUISIANA": "LA",
    19                  "NORTHERN MARIANA ISLANDS": "MP", "NEBRASKA": "NE", "ARIZONA": "AZ", "WISCONSIN": "WI",
    20                  "NORTH DAKOTA": "ND",
    21                  "Armed Forces Europe": "AE", "PENNSYLVANIA": "PA", "OKLAHOMA": "OK", "KENTUCKY": "KY",
    22                  "RHODE ISLAND": "RI",
    23                  "DISTRICT OF COLUMBIA": "DC", "ARKANSAS": "AR", "MISSOURI": "MO", "TEXAS": "TX", "MAINE": "ME"
    24                  }
    25 def studyfuzzy():
    26     process.extractOne("Minnesotta", choices=state_to_code.keys())
    27     process.extractOne("Minnesotta", choices=state_to_code.keys(), score_cutoff=80)
    28     process.extractOne("Minnesotta", choices=state_to_code.keys(), score_cutoff=96)
    29 
    30     state_to_code.keys()
    31     state_to_code.values()
    32     state_to_code.viewkeys()
    33     state_to_code.viewvalues()
    34     state_to_code.viewitems()
    35     process.extractOne("AlaBAMMazzz", choices=state_to_code.keys(), score_cutoff=80)
    36     process.extractOne("AlaBAMMazzz",choices=state_to_code.keys())
    In[6]: from fuzzywuzzy import fuzz
    
    In[7]: from fuzzywuzzy import process
    
    In[8]: state_to_code = {"VERMONT": "VT", "GEORGIA": "GA", "IOWA": "IA", "Armed Forces Pacific": "AP", "GUAM": "GU",
                     "KANSAS": "KS", "FLORIDA": "FL", "AMERICAN SAMOA": "AS", "NORTH CAROLINA": "NC", "HAWAII": "HI",
                     "NEW YORK": "NY", "CALIFORNIA": "CA", "ALABAMA": "AL", "IDAHO": "ID",
                     "FEDERATED STATES OF MICRONESIA": "FM",
                     "Armed Forces Americas": "AA", "DELAWARE": "DE", "ALASKA": "AK", "ILLINOIS": "IL",
                     "Armed Forces Africa": "AE", "SOUTH DAKOTA": "SD", "CONNECTICUT": "CT", "MONTANA": "MT",
                     "MASSACHUSETTS": "MA",
                     "PUERTO RICO": "PR", "Armed Forces Canada": "AE", "NEW HAMPSHIRE": "NH", "MARYLAND": "MD",
                     "NEW MEXICO": "NM",
                     "MISSISSIPPI": "MS", "TENNESSEE": "TN", "PALAU": "PW", "COLORADO": "CO",
                     "Armed Forces Middle East": "AE",
                     "NEW JERSEY": "NJ", "UTAH": "UT", "MICHIGAN": "MI", "WEST VIRGINIA": "WV", "WASHINGTON": "WA",
                     "MINNESOTA": "MN", "OREGON": "OR", "VIRGINIA": "VA", "VIRGIN ISLANDS": "VI", "MARSHALL ISLANDS": "MH",
                     "WYOMING": "WY", "OHIO": "OH", "SOUTH CAROLINA": "SC", "INDIANA": "IN", "NEVADA": "NV",
                     "LOUISIANA": "LA",
                     "NORTHERN MARIANA ISLANDS": "MP", "NEBRASKA": "NE", "ARIZONA": "AZ", "WISCONSIN": "WI",
                     "NORTH DAKOTA": "ND",
                     "Armed Forces Europe": "AE", "PENNSYLVANIA": "PA", "OKLAHOMA": "OK", "KENTUCKY": "KY",
                     "RHODE ISLAND": "RI",
                     "DISTRICT OF COLUMBIA": "DC", "ARKANSAS": "AR", "MISSOURI": "MO", "TEXAS": "TX", "MAINE": "ME"
                     }

     

    Out[19]: ('MINNESOTA', 95)
    In[20]: process.extractOne("Minnesotta", choices=state_to_code.keys(), score_cutoff=80)
    
    Out[20]: ('MINNESOTA', 95)
    In[21]: process.extractOne("Minnesotta", choices=state_to_code.keys(), score_cutoff=96)
    
    In[22]: process.extractOne("AlaBAMMazzz", choices=state_to_code.keys(), score_cutoff=80)
    
    In[23]: process.extractOne("AlaBAMMazzz",choices=state_to_code.keys())
    
    Out[23]: ('ALABAMA', 78)

     

    转载于:https://www.cnblogs.com/laoduan/p/python1.html

    展开全文
  • 例子:如果你打算操作某个跟目录下 包含‘test’串的文件, 那么这篇文章对你或许有用 import glob path = "/Users/name/...temDir = glob.glob(path)#模糊匹配,该目录下的文件,结果是各个文件路径组成的列表 ...
  • Python列表模糊匹配

    2020-03-26 13:55:09
    Python列表模糊匹配
  • Python实现模糊匹配

    2018-10-17 22:41:45
    Python实现字符串的模糊匹配,‘?’代表一个字符, ‘*’代表任意多个字符。给一段明确字符比如avdjnd 以及模糊字符比如*dj?dji?ejj,判断二者是否匹配。若能匹配输出”Yes”, 否则输出“No”
  • python实现模糊匹配

    万次阅读 2018-10-17 22:39:30
    题目:模糊匹配, ‘?’代表一个字符, *代表任意多个字符。给一段明确字符比如avdjnd 以及模糊字符比如*dj?dji?ejj,判断二者是否匹配。若能匹配输出”Yes”, 否则输出“No” (为了方便阅读,代码里面输出Ture or ...
  • 利用python库:fuzzywuzzy及difflib,两个库均可实现词粒度的模糊匹配,同时可设定模糊阈值,实现关键词的提取、地址匹配、语法检查等 2. fuzzywuzzy pip install fuzzywuzzy from fuzzywuzzy import process ...
  • python fuzzywuzzy 模糊匹配,计算相似度 from fuzzywuzzy import fuzz from fuzzywuzzy import process 1:简单匹配 a = fuzz.ratio('this is a shot','this is a shat') Out[37]: 93 2:非完全匹配 b = fuzz....
  • a = ['123','666','355'] b = ['2','5'] for i in range(len(b)): for j in range(len(a)): if a[j].find(b[i]) == -1: continue print(a[j])
  • Python字符串模糊匹配库FuzzyWuzzy 在计算机科学中,字符串模糊匹配(fuzzy string matching)是一种近似地(而不是精确地)查找与模式匹配的字符串的技术。换句话说,字符串模糊匹配是一种搜索,即使用户拼错...
  • /usr/bin/python # -*- coding: utf-8 -*- import os pathlog = "/usr/local/nginx/log" files = os.listdir(pathlog) for f in files: if 'stat' in f and f.endswith('.log'): print ("Foun...
  • python 字符串模糊匹配 Fuzzywuzzy

    千次阅读 2019-05-17 20:37:14
    Python提供fuzzywuzzy模块,不仅可用于计算两个字符串之间的相似度,而且还提供排序接口能从大量候选集中找到最相似的句子。 (1)安装 pip install fuzzywuzzy (2)接口说明 两个模块:fuzz, process,fuzz主要...
  • fuzzywuzzy:Python中的字符串模糊匹配
  • python 之实现模糊匹配

    千次阅读 2017-11-12 13:46:00
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 #!/usr/bin/envpythong #_*_coding:utf-8_*_ importre # data=[ 'tantianranphone118', ...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 673
精华内容 269
关键字:

python模糊匹配

python 订阅