精华内容
下载资源
问答
  • python 筛选

    2019-01-03 19:46:00
    data=[0,9,3,2,1,3,2,-2,-1] result=[x for x in data if x >=0 ] 转载于:https://www.cnblogs.com/sea-stream/p/10216476.html

     

    data=[0,9,3,2,1,3,2,-2,-1]
    result=[x for x in data if x >=0 ]

     

    转载于:https://www.cnblogs.com/sea-stream/p/10216476.html

    展开全文
  • python筛选器 介绍 (Introduction) The Python built-in filter() function can be used to create a new iterator from an existing iterable (like a list or dictionary) that will efficiently filter out ...

    python筛选器

    介绍 (Introduction)

    The Python built-in filter() function can be used to create a new iterator from an existing iterable (like a list or dictionary) that will efficiently filter out elements using a function that we provide. An iterable is a Python object that can be “iterated over”, that is, it will return items in a sequence such that we can use it in a for loop.

    Python内置的filter()函数可用于从现有的可迭代对象(如列表字典 )创建新的迭代器,该迭代器将使用我们提供的函数有效滤除元素。 一个可迭代对象是可以“迭代”的Python对象,也就是说,它将按顺序返回项目,以便我们可以在for循环中使用它。

    The basic syntax for the filter() function is:

    filter()函数的基本语法为:

    filter(function, iterable)

    This will return a filter object, which is an iterable. We can use a function like list() to make a list of all the items returned in a filter object.

    这将返回一个可迭代的过滤器对象。 我们可以使用类似list()的函数来列出过滤器对象中返回的所有项目的列表。

    The filter() function provides a way of filtering values that can often be more efficient than a list comprehension, especially when we’re starting to work with larger data sets. For example, a list comprehension will make a new list, which will increase the run time for that processing. This means that after our list comprehension has completed its expression, we’ll have two lists in memory. However, filter() will make a simple object that holds a reference to the original list, the provided function, and an index of where to go in the original list, which will take up less memory.

    filter()函数提供了一种过滤值的方法,该方法通常比列表理解更有效,尤其是当我们开始使用较大的数据集时。 例如,列表理解将创建一个新列表,这将增加该处理的运行时间。 这意味着列表理解完成后,内存中将有两个列表。 但是, filter()将创建一个简单的对象,该对象包含对原始列表的引用,所提供的函数以及在原始列表中的位置的索引,这将占用较少的内存。

    In this tutorial, we’ll review four different ways of using filter(): with two different iterable structures, with a lambda function, and with no defined function.

    在本教程中,我们将回顾使用filter()四种不同方式:具有两种不同的可迭代结构,具有lambda函数且未定义函数。

    filter()与函数一起使用 (Using filter() with a Function)

    The first argument to filter() is a function, which we use to decide whether to include or filter out each item. The function is called once for every item in the iterable passed as the second argument and each time it returns False, the value is dropped. As this argument is a function, we can either pass a normal function or we can make use of lambda functions, particularly when the expression is less complex.

    filter()的第一个参数是一个函数 ,我们用它来决定是包含还是过滤掉每个项目。 对于作为第二个参数传递的iterable中的每个项目,均会调用该函数一次,并且每次返回False ,该值都将被删除。 因为此参数是一个函数,所以我们可以传递一个普通函数,也可以使用lambda函数,尤其是当表达式不太复杂时。

    Following is the syntax of a lambda with filter():

    以下是带有filter()lambda的语法:

    filter(lambda item: item[] expression, iterable)

    With a list, like the following, we can incorporate a lambda function with an expression against which we want to evaluate each item from the list:

    使用如下列表,我们可以将lambda函数与一个表达式结合在一起,我们要根据该表达式评估列表中的每个项目:

    creature_names = ['Sammy', 'Ashley', 'Jo', 'Olly', 'Jackie', 'Charlie']

    To filter this list to find the names of our aquarium creatures that start with a vowel, we can run the following lambda function:

    要过滤此列表以查找以元音开头的水族馆生物的名称,我们可以运行以下lambda函数:

    print(list(filter(lambda x: x[0].lower() in 'aeiou', creature_names)))

    Here we declare an item in our list as x. Then we set our expression to access the first character of each string (or character “zero”), so x[0]. Lowering the case of each of the names ensures this will match letters to the string in our expression, 'aeiou'.

    在这里,我们将列表中的一项声明为x 。 然后,我们设置表达式以访问每个字符串的第一个字符(或字符“零”),即x[0] 。 减小每个名称的大小写可确保这将使字母与表达式'aeiou'的字符串匹配。

    Finally we pass the iterable creature_names. Like in the previous section we apply list() to the result in order to create a list from the iterator filter() returns.

    最后,我们传递可迭代的creature_names 。 像在上一节中一样,我们将list()应用于结果,以便从迭代器filter()返回值创建列表。

    The output will be the following:

    输出将如下所示:

    Output
    ['Ashley', 'Olly']

    This same result can be achieved using a function we define:

    使用我们定义的函数可以达到相同的结果:

    creature_names = ['Sammy', 'Ashley', 'Jo', 'Olly', 'Jackie', 'Charlie']
    
    def names_vowels(x):
      return x[0].lower() in 'aeiou'
    
    filtered_names = filter(names_vowels, creature_names)
    
    print(list(filtered_names))

    Our function names_vowels defines the expression that we will implement to filter creature_names.

    我们的函数names_vowels定义了我们将要实现的表达式,以过滤creature_names

    Again, the output would be as follows:

    同样,输出将如下所示:

    Output
    ['Ashley', 'Olly']

    Overall, lambda functions achieve the same result with filter() as when we use a regular function. The necessity to define a regular function grows as the complexity of expressions for filtering our data increases, which is likely to promote better readability in our code.

    总的来说, lambda函数通过filter()达到与使用常规函数时相同的结果。 定义正则函数的必要性随着用于过滤数据的表达式的复杂性增加而增加,这很可能会提高代码的可读性。

    Nonefilter() (Using None with filter())

    We can pass None as the first argument to filter() to have the returned iterator filter out any value that Python considers “falsy”. Generally, Python considers anything with a length of 0 (such as an empty list or empty string) or numerically equivalent to 0 as false, thus the use of the term “falsy.”

    我们可以将None作为第一个参数传递给filter()以使返回的迭代器滤除Python认为“虚假”的任何值。 通常,Python将长度为0任何内容(例如空列表或空字符串)或数值上等于0视为false,因此使用术语“虚假”。

    In the following case we want to filter our list to only show the tank numbers at our aquarium:

    在以下情况下,我们希望过滤列表以仅显示水族馆的水箱编号:

    aquarium_tanks = [11, False, 18, 21, "", 12, 34, 0, [], {}]

    In this code we have a list containing integers, empty sequences, and a boolean value.

    在此代码中,我们有一个包含整数 ,空序列和布尔值的列表

    filtered_tanks = filter(None, aquarium_tanks)

    We use the filter() function with None and pass in the aquarium_tanks list as our iterable. Since we have passed None as the first argument, we will check if the items in our list are considered false.

    我们将filter()函数与None ,并将aquarium_tanks列表作为可迭代方法传递。 由于我们已将None作为第一个参数传递,因此我们将检查列表中的项目是否被视为false。

    print(list(filtered_tanks))

    Then we wrap filtered_tanks in a list() function so that it returns a list for filtered_tanks when we print.

    然后我们总结filtered_tankslist()函数,以便它返回一个列表filtered_tanks ,当我们打印。

    Here the output shows only the integers. All the items that evaluated to False, that are equivalent to 0 in length, were removed by filter():

    在这里,输出仅显示整数。 所有评估为False的项目(长度等于0 filter()已由filter()删除:

    Output
    [11, 25, 18, 21, 12, 34]

    Note: If we don’t use list() and print filtered_tanks we would receive a filter object something like this: <filter object at 0x7fafd5903240>. The filter object is an iterable, so we could loop over it with for or we can use list() to turn it into a list, which we’re doing here because it’s a good way to review the results.

    注意 :如果我们不使用list()并打印filtered_tanks我们将收到类似以下内容的<filter object at 0x7fafd5903240><filter object at 0x7fafd5903240> 。 过滤器对象是可迭代的,因此我们可以使用for对其进行循环,也可以使用list()将其转换为列表,我们在这里这样做是因为它是查看结果的好方法。

    With None we have used filter() to quickly remove items from our list that were considered false.

    对于None我们使用filter()快速从列表中删除被认为是假的项目。

    filter()与字典列表一起使用 (Using filter() with a List of Dictionaries)

    When we have a more complex data structure, we can still use filter() to evaluate each of the items. For example, if we have a list of dictionaries, not only do we want to iterate over each item in the list — one of the dictionaries — but we may also want to iterate over each key:value pair in a dictionary in order to evaluate all the data.

    当我们拥有更复杂的数据结构时,我们仍然可以使用filter()评估每个项目。 例如,如果我们有一个字典列表,我们不仅要遍历列表中的每个项目(其中之一是字典),而且还可能要遍历字典中的每个key:value对,以便求值所有数据。

    As an example, let’s say we have a list of each creature in our aquarium along with different details about each of them:

    举例来说,假设我们在水族馆中有每个生物的清单,以及每个生物的不同详细信息:

    aquarium_creatures = [
      {"name": "sammy", "species": "shark", "tank number": "11", "type": "fish"},
      {"name": "ashley", "species": "crab", "tank number": "25", "type": "shellfish"},
      {"name": "jo", "species": "guppy", "tank number": "18", "type": "fish"},
      {"name": "jackie", "species": "lobster", "tank number": "21", "type": "shellfish"},
      {"name": "charlie", "species": "clownfish", "tank number": "12", "type": "fish"},
      {"name": "olly", "species": "green turtle", "tank number": "34", "type": "turtle"}
    ]

    We want to filter this data by a search string we give to the function. To have filter() access each dictionary and each item in the dictionaries, we construct a nested function, like the following:

    我们希望通过提供给该函数的搜索字符串来过滤此数据。 为了使filter()访问字典中的每个字典和每个项目,我们构造一个嵌套函数,如下所示:

    def filter_set(aquarium_creatures, search_string):
        def iterator_func(x):
            for v in x.values():
                if search_string in v:
                    return True
            return False
        return filter(iterator_func, aquarium_creatures)

    We define a filter_set() function that takes aquarium_creatures and search_string as parameters. In filter_set() we pass our iterator_func() as the function to filter(). The filter_set() function will return the iterator resulting from filter().

    我们定义一个filter_set()函数,该函数将aquarium_creaturessearch_string作为参数。 在filter_set()我们将iterator_func()作为函数传递给filter()filter_set()函数将返回从filter()得到的迭代filter()

    The iterator_func() takes x as an argument, which represents an item in our list (that is, a single dictionary).

    iterator_func()x作为参数,表示我们列表中的一个项目(即单个字典)。

    Next the for loop accesses the values in each key:value pair in our dictionaries and then uses a conditional statement to check whether the search_string is in v, representing a value.

    接下来, for循环访问字典中每个key:value对中的值,然后使用条件语句检查search_string是否在v ,表示一个值。

    Like in our previous examples, if the expression evaluates to True the function adds the item to the filter object. This will return once the filter_set() function has completed. We position return False outside of our loop so that it checks every item in each dictionary, instead of returning after checking the first dictionary alone.

    就像在前面的示例中一样,如果表达式的计算结果为True该函数会将该项添加到过滤器对象。 一旦filter_set()函数完成,它将返回。 我们将return False放置在循环之外,以便它检查每个词典中的每个项目,而不是仅在检查了第一个词典之后才返回。

    We call filter_set() with our list of dictionaries and the search string we want to find matches for:

    我们使用字典列表和我们要查找与之匹配的搜索字符串来调用filter_set()

    filtered_records = filter_set(aquarium_creatures, "2")

    Once the function completes we have our filter object stored in the filtered_records variable, which we turn into a list and print:

    函数完成后,我们将过滤器对象存储在filtered_records变量中,将其转换为列表并打印:

    print(list(filtered_records))

    We’ll receive the following output from this program:

    我们将从该程序接收以下输出:

    Output
    [{'name': 'ashley', 'species': 'crab', 'tank number': '25', 'type': 'shellfish'}, {'name': 'jackie', 'species': 'lobster', 'tank number': '21', 'type': 'shellfish'}, {'name': 'charlie', 'species': 'clownfish', 'tank number': '12', 'type': 'fish'}]

    We’ve filtered the list of dictionaries with the search string 2. We can see that the three dictionaries that included a tank number with 2 have been returned. Using our own nested function allowed us to access every item and efficiently check each against the search string.

    我们使用搜索字符串2过滤了词典列表。 我们可以看到,返回了三个带有2坦克编号的字典。 使用我们自己的嵌套函数,我们可以访问每个项目并根据搜索字符串有效地检查每个项目。

    结论 (Conclusion)

    In this tutorial, we’ve learned the different ways of using the filter() function in Python. Now you can use filter() with your own function, a lambda function, or with None to filter for items in varying complexities of data structures.

    在本教程中,我们学习了在Python中使用filter()函数的不同方法。 现在,您可以将filter()与您自己的函数, lambda函数一起使用,或者与None一起使用,以过滤数据结构复杂程度不同的项目。

    Although in this tutorial we printed the results from filter() immediately in list format, it is likely in our programs we would use the returned filter() object and further manipulate the data.

    尽管在本教程中,我们立即以列表格式打印了filter()的结果,但是很可能在我们的程序中,我们将使用返回的filter()对象并进一步处理数据。

    If you would like to learn more Python, check out our How To Code in Python 3 series and our Python topic page.

    如果您想了解更多Python,请查看我们的“ 如何使用Python 3编码”系列和Python主题页面

    翻译自: https://www.digitalocean.com/community/tutorials/how-to-use-the-python-filter-function

    python筛选器

    展开全文
  • Python筛选EXCEL数据

    2020-12-25 22:52:35
    Python筛选EXCEL数据Python筛选EXCEL数据 Python筛选EXCEL数据 我们在实际业务过程中,可能涉及到excel数据清洗的场景,本次代码处理的是客户个人基本信息的清洗操作,其中包含了身份证,性别,国籍,电话,职业,...

    Python筛选EXCEL数据

    Python筛选EXCEL数据

    我们在实际业务过程中,可能涉及到excel数据清洗的场景,本次代码处理的是客户个人基本信息的清洗操作,其中包含了身份证,性别,国籍,电话,职业,身份证地址,居住地地址,职业描述,以及证件起始日期的筛选操作,我们可以根据实际需要一并判断都判断,或者单个判断,实际代码如下:

    配置文件名:config.ini

    [File]
    ### 填写待处理文件信息
    file_name1 = C:\Users\Administrator\Desktop\新建 Microsoft Excel 工作表.xlsx
    sheet_name = Sheet1
    
    [9yaosu]
    ### 请填写需要处理的数据在数据表中的列名,例如ABCDEF,如果以下的筛选数据不需要筛选时可以直接不写列名信息
    ### 身份证列名
    id_position =S
    ### 性别列名
    sex_position =
    ### 国籍列名
    nationality_position =R
    ### 电话列名
    phone_position =L
    ### 职业列名
    job_position =
    ### 身份证地址列名
    id_address_position =O
    ### 居住地址列名
    live_address_position =
    ### 职业描述列名
    job_description_position =W
    ### 证件起始时间列名(必须同时都填或者都不填)
    certificates_start_time_position =J
    certificates_end_time_position =K
    

    python脚本文件:xx.py

    # coding=utf-8
    import configparser
    import os
    import re
    import time
    from xlrd import xldate_as_tuple
    from datetime import datetime
    import openpyxl
    import xlrd
    from openpyxl.cell.cell import ILLEGAL_CHARACTERS_RE
    
    
    def fix_illegal(s):
        try:
            a = ILLEGAL_CHARACTERS_RE.sub(r"", s)
        except:
            a = s
        return a
    
    
    if __name__ == '__main__':
        cf = configparser.ConfigParser()
        cf.read("config.ini", encoding="utf-8")
        start_time = time.time()
        """待处理数据的文件名"""
        file_name1 = cf.get("File", "file_name1")
        """待处理的excel中需要操作的sheet名"""
        sheet_name = cf.get("File", "sheet_name")
        """异常数据的文件名"""
        empty_file = os.path.dirname(file_name1) + r"\异常数据.xlsx"
        """完整数据的文件名"""
        full_file = os.path.dirname(file_name1) + r"\完整数据.xlsx"
        excel = xlrd.open_workbook(file_name1)
        sheet = excel.sheet_by_name(sheet_name)
        rows = sheet.nrows
        cols = sheet.ncols
        row1 = sheet.row_values(0)
        count1 = 1
        count2 = 1
    
        """新建空数据表"""
        empty_excel = openpyxl.Workbook()
        empty_sheet = empty_excel.active
        empty_sheet.title = (sheet_name)
        empty_sheet.append(row1)
        """新建完整数据表"""
        full_excel = openpyxl.Workbook()
        full_sheet = full_excel.active
        full_sheet.title = (sheet_name)
        full_sheet.append(row1)
    
        dic = {"A": 1, "B": 2, "C": 3, "D": 4, "E": 5,
               "F": 6, "G": 7, "H": 8, "I": 9, "J": 10,
               "K": 11, "L": 12, "M": 13, "N": 14, "O": 15,
               "P": 16, "Q": 17, "R": 18, "S": 19, "T": 20,
               "U": 21, "V": 22, "W": 23, "X": 24, "Y": 25, "Z": 26}
    
        """获取校验元素位置"""
        id_position = dic.get(cf.get("9yaosu", "id_position"), 0)
        sex_position = dic.get(cf.get("9yaosu", "sex_position"), 0)
        nationality_position = dic.get(cf.get("9yaosu", "nationality_position"), 0)
        phone_position = dic.get(cf.get("9yaosu", "phone_position"), 0)
        job_position = dic.get(cf.get("9yaosu", "job_position"), 0)
        id_address_position = dic.get(cf.get("9yaosu", "id_address_position"), 0)
        live_address_position = dic.get(cf.get("9yaosu", "live_address_position"), 0)
        job_description_position = dic.get(cf.get("9yaosu", "job_description_position"), 0)
        certificates_start_time_position = dic.get(cf.get("9yaosu", "certificates_start_time_position"), 0)
        certificates_end_time_position = dic.get(cf.get("9yaosu", "certificates_end_time_position"), 0)
    
        for i in range(1, rows):
            row_value = sheet.row_values(i)
            c1 = True
            c2 = True
            c3 = True
            c4 = True
            c5 = True
            c6 = True
            c7 = True
            c8 = True
            c9 = True
    
            # ==================================================================================
            """判断身份证和性别数据是否正常"""
            if id_position > 0:
                id_card = row_value[id_position - 1].strip()
                # 如果身份证存在就判断身份证倒数第二位是否跟性别相符合
                if sex_position > 0:  # 如果表中存在性别,则判断性别是否是男或者女
                    sex = row_value[sex_position - 1].strip()
                    # 获取客户性别
                    id_sex = ["男", "女"][id_card and (int(id_card[-2]) % 2 == 0)]
                    c1 = id_sex in sex
                # 当身份证存在时判断身份证是否正常
                if id_card:
                    c8 = ((id_card[0:-1].isdecimal() or id_card[-1].isdecimal() or id_card[-1] == "X")
                          and len(id_card) == 18)
                else:
                    c8 = False
            else:  # 如果身份证不存在则直接判断性别是否是男或者女
                if sex_position > 0:  # 如果表中存在性别,则判断性别是否是男或者女
                    sex = row_value[sex_position - 1].strip()
                    c1 = "男" in sex or "女" in sex
    
            # ==================================================================================
            """判断国籍数据是否正常"""
            if nationality_position > 0:
                nationality = row_value[nationality_position - 1].strip()
                c2 = "中国" in nationality or "CN" in nationality
    
            # ==================================================================================
            """判断电话号码数据是否正常"""
            if phone_position > 0:
                phone = row_value[phone_position - 1].strip()
                c3 = phone.isdecimal() and (len(phone) == 8 or len(phone) == 11)
    
            # ==================================================================================
            """判断职业数据是否正常"""
            if job_position > 0:
                job = row_value[job_position - 1].strip()
                c4 = len(job) > 0
    
            # ==================================================================================
            """判断身份证地址数据是否正常"""
            if id_address_position > 0:
                id_address = row_value[id_address_position - 1].strip()
                c5_1 = (len(re.findall("([\u4e00-\u9fbb])", id_address)) > 9)
                # 判断地址是否合法
                a = ["省", "市", "区", "县", "镇", "村", "湾", "弯", "巷", "弄", "公司", "厂", "室", "号", "户", "乡", "组"]
                c5 = c5_1 and any(i in a for i in id_address)
    
            # ==================================================================================
            """判断居住地地址数据是否正常"""
            if live_address_position > 0:
                live_address = row_value[live_address_position - 1].strip()
                c6_1 = (len(re.findall("([\u4e00-\u9fbb])", live_address)) > 9)
                a = ["省", "市", "区", "县", "镇", "村", "湾", "弯", "巷", "弄", "公司", "厂", "室", "号", "户", "乡", "组"]
                c6 = c6_1 and any(i in a for i in live_address)
    
            # ==================================================================================
            """判断职业描述数据是否正常"""
            if job_description_position > 0:
                job_description = row_value[job_description_position - 1].strip()
                match = (re.compile(u"[\u4e00-\u9fbb]")).search(job_description)
                c7 = len(job_description) > 0 and (job_description != "无" or
                                                   job_description != "无无" or
                                                   job_description != "一般人员" or
                                                   job_description != "一般员工" or
                                                   job_description != "一贝人员" or
                                                   match)
    
            # ==================================================================================
            """判断证件到期数据是否正常"""
            if certificates_start_time_position > 0 and certificates_end_time_position > 0:
                a = row_value[certificates_start_time_position - 1]
                b = row_value[certificates_end_time_position - 1]
                if isinstance(a, float):
                    certificates_start_time = datetime(*xldate_as_tuple(a, 0)).strftime("%Y-%m-%d")
                else:
                    certificates_start_time = a.replace("-", "").replace("/", "").strip()
                if b:
                    if isinstance(a, float):
                        certificates_end_time = datetime(*xldate_as_tuple(b, 0)).strftime("%Y-%m-%d")
                    else:
                        certificates_end_time = b.replace("-", "").replace("/", "").strip()
                else:
                    certificates_end_time = "21991231"
                try:
                    start_year = int(certificates_start_time[0:4])
                    start_month = certificates_start_time[4:]
                    end_year = int(certificates_end_time[0:4])
                    end_month = certificates_end_time[4:]
                except:
                    start_year = 0
                    start_month = ""
                    end_year = 0
                    end_month = ""
    
                youxiao1 = (end_year - start_year == 5) and (start_month == end_month)
                youxiao2 = (end_year - start_year == 10) and (start_month == end_month)
                youxiao3 = (end_year - start_year == 20) and (start_month == end_month)
                youxiao4 = ("20991231" in certificates_end_time) or ("21991231" in certificates_end_time)
                youxiao = youxiao1 or youxiao2 or youxiao3 or youxiao4
                c9 = certificates_start_time and ("1899" not in certificates_start_time) and youxiao
    
            # 错误原因
            reason = ""
            if not c1:
                reason += "【性别不合法】"
            if not c2:
                reason += "【国籍不合法】"
            if not c3:
                reason += "【电话不合法】"
            if not c4:
                reason += "【职业不合法】"
            check_address = c5 and c6
            if not check_address:
                reason += "【地址不合法】"
            if not c7:
                reason += "【职业描述不合法】"
            if not c8:
                reason += "【身份证不合法】"
            if not c9:
                reason += "【证件起始日期不合法】"
    
            if c1 and c2 and c3 and c4 and check_address and c7 and c8 and c9:
                row_value1 = list(map(fix_illegal, row_value))
                full_sheet.append(row_value1)
                count1 = count1 + 1
            else:
                row_value1 = list(map(fix_illegal, row_value)) + [reason]
                empty_sheet.append(row_value1)
                count2 = count2 + 1
    
            print("\r", f"处理数据第{i}条....", end="", flush=True)
    
        """=================================================================================="""
        """打印完整数据数量/异常数据数量"""
        print("\n")
        print(f"完整数据数量是:{count1 - 1}")
        print(f"异常数据数量是:{count2 - 1}")
        """写入异常数据"""
        empty_excel.save(empty_file)
        """写入完整数据"""
        full_excel.save(full_file)
        """打印代码运行时间"""
        end_time = time.time()
        print(f"总耗费时间是:{end_time - start_time}s")
    
    
    展开全文
  • Python筛选数字集合内满足指定条件的数据方法,python筛选,唯一匹配是指&mdash唯一匹配是指————任何找出来的一对数中,位于一个集合中的数只能和另一个集合中的唯一 一个数匹配,有多个匹配组合的话全部舍弃。...

    求Python筛选数字集合内满足指定条件的数据方法,python筛选,唯一匹配是指&mdash

    唯一匹配是指————任何找出来的一对数中,位于一个集合中的数只能和另一个集合中的唯一 一个数匹配,有多个匹配组合的话全部舍弃。

    相当于找出匹配的数据后再查重,最后得出的是两个集合中满足条件且唯一对应的匹配数的集合。

    假设各集合数据量百万个吧。

    有意思。

    先排序,然後遍歷,同時刪掉兩個集合中不能唯一匹配的數會簡單一些。這樣兩個集合中剩下的數目一樣,剛好按順序一一對應。

    不算排序,複雜度大致是 O(mn)。m 爲範圍。

    想不到複雜度更低的方法了 ╮(╯_╰)╭

    僞代碼(javascript2):javascript2for (let c1 of sortedSet1) { for (let c2 of sortedSet2) { if (Math.abs(c1 - c2) < range) { sortedSet1.delete(c1); sortedSet2.delete(c2); } }}

    编橙之家文章,

    展开全文
  • 如何使用Python筛选器功能(详解)python筛选器Python内置的filter()函数可用于从现有的迭代对象(如列表或字典)创建新的迭代器,该迭代器将使用我们提供的函数有效滤除元素。它将按顺序返回项目,以便我们可以在for...
  • 主要为大家详细介绍了python筛选出两个文件中重复行的方法,具有一定的参考价值,感兴趣的小伙伴们可以参考一下
  • 今天小编就为大家分享一篇python 筛选数据集中列中value长度大于20的数据集方法,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
  • 学过点计算机,稍微懂点编程后,会感觉看很...前阵子,看着运营小妹妹天天花一个多钟筛选数据分类导出表格,脑子里就产生了想法,觉得这太废时间,应该会有更好的解决方式,大概了解了一下情况,决定做一个解决方案...
  • 使用python筛选所需要的Excel文件   最近公司要求从两千多个Excel文件中,找出包含敏感信息的Excel文件。通过抽取部分样本文件之后,发现包含敏感信息的文件都有相同特征的数据,所以我通过通过python实现对特征...
  • python筛选文件

    2019-11-30 11:11:43
    python在指定目录筛选文件复制到另一指定文件中 下面编程实现将c盘的txt文件复制到到当前路径下 代码如下: import os import shutil#导入shutil模块 ls = os.listdir('C:\\)#获取目录列表 for name in ls: if ...
  • Python筛选目录下指定后缀的文件

    千次阅读 2019-12-01 11:08:12
    Python筛选目录下指定后缀的文件 From:https://stackoverflow.com/questions/2225564/get-a-filtered-list-of-files-in-a-directory Method one: glob.glob('145592*.jpg') glob.glob()is definitely the way...
  • python筛选excel某一列中相同的数据

    万次阅读 多人点赞 2018-11-13 11:34:57
    python筛选excel某一列中相同的数据 1.需要cmd下载 pip install pandas 的模板 2.注意文件的路径问题不要出错 3.还有文件的编码格式 ------encoding = 'gbk' import pandas as pd #读取excel文件信息 d...
  • Python全局变量与局部变量详解www.002pc.com对《python筛选具有相同属性的字段Python全局变量与局部变量详解》总结来说,为我们python培训很实用。#Python中的变量:全局变量和局部变量#在很多语言中,在声明全局...
  • 今日知识点python筛选器介绍Python内置的filter()函数可用于从现有的迭代对象(如列表或字典)创建新的迭代器,该迭代器将使用我们提供的函数有效滤除元素。它将按顺序返回项目...
  • Python3中:(1)xrange的功能合并到range里面,xrange已经不存在 -> range和xrange用法(2)filter已经不能返回一个list,而是只能返回一个迭代对象,需要套在一个list()里面,且,需要注意的是,filter过滤后,对...
  • python筛选大量数据

    2021-03-09 09:40:05
    如何仅筛选出交叉口进口道的网约车数据?现有的数据包里有整个交叉口的数据,数据格式类型如下,可带价私。 <p style="text-align:center"><img alt="" height="790" src=...
  • 碰到一个问题,要对多语言包里的xml进行内容筛选, 进行文本内容处理,Python是个不错的选择import sysimport osdef main():icount = 0fileName = sys.argv[1] + "\\resources.xml"outName = sys.argv[1] + "\\tmp....
  • {"moduleinfo":{"card_count":[{"count_phone":1,"count":1}],"search_count":[{"count_phone":4,"count":4}]},"card":[{"des":"阿里云数据库专家保驾护航,为用户的数据库应用系统进行性能和风险评估,参与配合进行...
  • 我在做数千条序列系统发育时,可能有很多seq虽然名字不同但是序列一样,或者做某些分析时候需要做到没有简并碱基(datamonkey选择压力),那么需要筛选。以下为我写的python代码:​#!/usr/bin/env pythonfasta_file=...
  • python 筛选Nan值

    2021-03-17 13:41:56
    无法直接筛选 先把nan值替换为字符串 df = df.fillna(‘None’) 再筛选 txt = df[df[‘零件家族’] == ‘None’]
  • usr/bin/pythonimport sysimport os'''字符串查找函数,使用二分查找法在列表中进行查询'''def binarySearch(value, lines):right = len(lines) - 1left = 0a = value.strip()while left <= right:middle = ...
  • 一、说明本例实现了股票筛选功能。前一半是过滤出市盈率在0-30倍之间,且今日换手率>1%,涨幅超2%的股票。后一半统计今日涨停和接近涨停的股票。二、程序#! usr/bin/python #coding=utf-8import pandas as pd...
  • 本博主要总结DaraFrame数据筛选方法(loc,iloc,ix,at,iat),并以操作csv文件为例进行说明1. 数据筛选a b c0 0 2 41 6 8 102 12 14 163 18 20 224 24 26 285 30 32 346 36 38 407 42 44 468 48 50 529 54 56 ...
  • python筛选移动文件

    2020-05-12 00:06:06
    文章目录一、问题描述二、解决问题 一、问题描述 有如下文件,需要将如下文件分为两类 二、解决问题 直接上代码 import os import glob import shutil #1选择文件夹 path10 = r'G:\GL' path20 = r'G:\镶嵌\2019GL'...
  • 【摘要】 本文主要介绍如何对多个文本进行读取,并采用正则表达式对其中的信息进行筛选,将筛选出来的信息存写到一个新文本。打开文件:open(‘文件名’,‘打开方式’)>>>file=open(r'C:\Users\yuanlei\...
  • 目录1 概述2 Python读写Excel3 Python正则表达式3.1 基础知识3.2 实例分析4 pandas查询4.1 Series查询4.2 DataFrame查询5 国考职位表筛选参考文献后记 1 概述   2021国考在即,你报名了吗?假如你的office耍得好,...
  • 天天基金网上可以公开购买的基金有8000多个,如何从其中筛选出比较好的基金进行投资,可能会有很多策略,我这里使用了一个非常简单粗暴的方法,从六个时间维度,即'近1周', '近1月','近3月', '近6月', '近1年', '近2...
  • 本文主要介绍如何对多个文本进行读取,并采用正则表达式对其中的信息进行筛选,将筛选出来的信息存写到一个新文本。文本基础操作打开文件:open(‘文件名',‘打开方式')>>>file=open(r'C:\Users\yuanlei\...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 7,339
精华内容 2,935
关键字:

python筛选

python 订阅