精华内容
下载资源
问答
  • Python并行for循环例子

    万次阅读 2018-12-12 16:56:30
    前面转载了一篇关于Python并行执行for循环文章,写了个简单的例子。 使用 timeit 模块计算运行时间 使用 operator 模块判断数据的结果使用相同。 import multiprocessing import timeit import operator def do_...

    前面转载了一篇关于Python并行执行for循环文章,写了个简单的例子。
    使用 timeit 模块计算运行时间
    使用 operator 模块判断返回值是否相同。

    import multiprocessing
    import timeit
    import operator
    def do_something(x):
        v = pow(x, 2)
        return v
    
    if __name__ == '__main__':
        a =[]
        start = timeit.default_timer()
        for i in range(1, 100000000):
            a.append(do_something(i))
    
        end = timeit.default_timer()
        print('single processing time:', str(end-start), 's')
        print(a[1:10])
    
    	# revise to parallel
        items = [x for x in range(1, 100000000)]
        p = multiprocessing.Pool(4)
        start = timeit.default_timer()
        b = p.map(do_something, items)
        p.close()
        p.join()
        end = timeit.default_timer()
        print('multi processing time:', str(end-start)'s')
        print(b[1:10])
        print('Return values are all equal ?:', operator.eq(a, b))
    

    输出结果:

    single processing time: 53.43800377573327s
    [4, 9, 16, 25, 36, 49, 64, 81, 100]
    multi processing time: 26.168114312830433s
    [4, 9, 16, 25, 36, 49, 64, 81, 100]
    Return values are all equal ?: True
    
    展开全文
  • I am still in very early stage of my learning of Python. Apologize in advance if this question sounds stupid.I have this set of data (in table format) that I want to add few calculated columns to. Bas...

    I am still in very early stage of my learning of Python. Apologize in advance if this question sounds stupid.

    I have this set of data (in table format) that I want to add few calculated columns to. Basically I have some location lon/lat and destination lon/lat, and the respective data time, and I'm calculating the average velocity between each pair.

    Sample data look like this:

    print(data_all.head(3))

    id lon_evnt lat_evnt event_time \

    0 1 -179.942833 41.012467 2017-12-13 21:17:54

    1 2 -177.552817 41.416400 2017-12-14 03:16:00

    2 3 -175.096567 41.403650 2017-12-14 09:14:06

    dest_data_generate_time lat_dest lon_dest \

    0 2017-12-13 22:33:37.980 37.798599 -121.292193

    1 2017-12-14 04:33:44.393 37.798599 -121.292193

    2 2017-12-14 10:33:51.629 37.798599 -121.292193

    address_fields_dest \

    0 {'address': 'Nestle Way', 'city': 'Lathrop...

    1 {'address': 'Nestle Way', 'city': 'Lathrop...

    2 {'address': 'Nestle Way', 'city': 'Lathrop...

    I then zipped the lon/lat together:

    data_all['ping_location'] = list(zip(data_all.lon_evnt, data_all.lat_evnt))

    data_all['destination'] = list(zip(data_all.lon_dest, data_all.lat_dest))

    then I want to calculate the distance between each pair of location pings, and grab some address info from a string (basically taking a substring), and then calculate for the velocity:

    for idx, row in data_all.iterrows():

    dist = gcd.dist(row['destination'], row['ping_location'])

    data_all.loc[idx, 'gc_distance'] = dist

    temp_idx = str(row['address_fields_dest']).find(":")

    pos_start = temp_idx + 3

    pos_end = str(row['address_fields_dest']).find(",") - 2

    data_all.loc[idx, 'destination address'] = str(row['address_fields_dest'])[pos_start:pos_end]

    ##### calculate velocity which is: v = d/t

    ## time is the difference btwn destination time and the ping creation time

    timediff = abs(row['dest_data_generate_time'] - row['event_time'])

    data_all.loc[idx, 'velocity km/hr'] = 0

    ## check if the time dif btwn destination and event ping is more than a minute long

    if timediff > datetime.timedelta(minutes=1):

    data_all.loc[idx, 'velocity km/hr'] = dist / timediff.total_seconds() * 3600.0

    ok now, this program took me almost 7 hours to execute on 333k rows of data! :( I have windows 10 2 core 16gb ram... which is not much, but 7 hours is definitely not ok :(

    How can I make the program run more efficiently? One way I'm thinking is, since the data and its calculations are independent of each other, I can take advantage of parallel processing.

    I've read into many posts, but it seems like most of the parallel processing methods presented are for if I'm only using one simple function; but here I'm adding multiple new columns.

    Any help is really appreciated! or telling me that this is impossible to make pandas do parallel processing (which I believe I've read somewhere saying that but am not completely sure if it's 100% true still).

    Sample posts read into:

    and a lot more that are not on stackoverflow....

    解决方案

    Here is a quick solution - I didn't try to optimize your code at all, just fed it into a multiprocessing pool. This will run your function on each row individually, return a row with the new properties, and create a new dataframe from this output.

    import multiprocessing as mp

    pool = mp.Pool(processes=mp.cpu_count())

    def func( arg ):

    idx,row = arg

    dist = gcd.dist(row['destination'], row['ping_location'])

    row['gc_distance'] = dist

    temp_idx = str(row['address_fields_dest']).find(":")

    pos_start = temp_idx + 3

    pos_end = str(row['address_fields_dest']).find(",") - 2

    row['destination address'] = str(row['address_fields_dest'])[pos_start:pos_end]

    ##### calculate velocity which is: v = d/t

    ## time is the difference btwn destination time and the ping creation time

    timediff = abs(row['dest_data_generate_time'] - row['event_time'])

    row['velocity km/hr'] = 0

    ## check if the time dif btwn destination and event ping is more than a minute long

    if timediff > datetime.timedelta(minutes=1):

    row['velocity km/hr'] = dist / timediff.total_seconds() * 3600.0

    return row

    new_rows = pool.map( func, [(idx,row) for idx,row in data_all.iterrows()])

    data_all_new = pd.concat( new_rows )

    展开全文
  • Python并行执行for循环

    万次阅读 多人点赞 2018-12-04 20:52:55
    在介绍如何最简单地利用 python 实现并行前,我们先来看一个简单的代码。 words = ['apple', 'bananan', 'cake', 'dumpling'] for word in words: print word 上面的例子中,我们用一个 for 循环打印出 words ...

    简介
    在介绍如何最简单地利用 python 实现并行前,我们先来看一个简单的代码。

       words = ['apple', 'bananan', 'cake', 'dumpling']
       for word in words:
           print word
    

    上面的例子中,我们用一个 for 循环打印出 words 列表中的每个单词。问题来了,这里我们打印完一个单词才能接着打印另一个单词,能不能同时打印呢?好比如在银行的营业厅排队,如果只开一个窗口办理业务,你需要等前面一个人办完,才轮到你,如果能开多个窗口,显然会快很多。

    我们将上面的代码抽象成下面的模式:

    items = list()
    for item in items:
        process(item)
    

    其中,items 是一个列表,process(arg) 是一个函数,可以有返回值也可以没有。我们希望可以将这种模式改成并行处理的方式,比如可以引入多线程等处理方式,但是这些处理方式往往会让代码变得更加复杂。那么有什么简单的处理方式吗?

    并行化
    我们将上面的串行模式进行简单地改造,使之可以并行处理:

    from multiprocessing.dummy import Pool as ThreadPool
    items = list()
    pool = ThreadPool()
    pool.map(process, items)
    pool.close()
    pool.join()
    

    下面我们进行测试:

    # -*- coding: utf-8 -*-
    import time
    from multiprocessing.dummy import Pool as ThreadPool
    def get_logger(name):
        logger = logging.getLogger(name)
        logger.setLevel(logging.DEBUG)
        stream_handler = logging.StreamHandler()
        stream_handler.setLevel(logging.DEBUG)
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s [%(levelname)s] %(message)s')
        stream_handler.setFormatter(formatter)
        logger.addHandler(stream_handler)
        return logger
    def process(item):
        log = _get_logger(item)
        log.info("item: %s" % item)
        time.sleep(5)
    items = ['apple', 'bananan', 'cake', 'dumpling']
    pool = ThreadPool()
    pool.map(process, items)
    pool.close()
    pool.join()
    

    输出结果:

    2016-06-07 11:23:57,530 - apple [INFO] word: apple
    2016-06-07 11:23:57,530 - bananan [INFO] word: bananan
    2016-06-07 11:23:57,530 - cake [INFO] word: cake
    2016-06-07 11:23:57,531 - dumpling [INFO] word: dumpling
    

    从上面显示的时间可以看到,我们已经由原来的串行打印变成并行打印了。

    另外,上面的处理函数 process 是没有返回值的。假设 process 函数的返回值是 result,那么 results = pool.map(process, items) 的返回值是一个列表,每个元素对应着处理每个 item 的结果。

    因此,

    results = list()
    for item in item_list:
        result = process(item)
        results.append(result)
    return results
    

    上面的串行处理可以改成下面的并行处理:

    from multiprocessing.dummy import Pool as ThreadPool
    pool = ThreadPool()
    results = pool.map(process, item_list)
    pool.close()
    pool.join()
    

    动手举一个例子Python并行for循环例子

    参考资料
    Parallelism in one line
    一行 Python 实现并行化


    本文作者:FunHacks <最简单的 python 并行实现方式>
    原文地址:http://funhacks.net/2016/06/11/%E6%9C%80%E7%AE%80%E5%8D%95%E7%9A%84python%E5%B9%B6%E8%A1%8C%E5%AE%9E%E7%8E%B0%E6%96%B9%E5%BC%8F/
    文章均采用 Creative Commons BY-NC-ND 4.0(自由转载-保持署名-非商用-禁止演绎)协议发布.

    展开全文
  • I'm quite new to Python (using Python 3.2) and I have a question concerning ... I have a for-loop that I wish to execute in parallel using "multiprocessing" in Python 3.2:def computation:...

    I'm quite new to Python (using Python 3.2) and I have a question concerning parallelisation. I have a for-loop that I wish to execute in parallel using "multiprocessing" in Python 3.2:

    def computation:

    global output

    for x in range(i,j):

    localResult = ... #perform some computation as a function of i and j

    output.append(localResult)

    In total, I want to perform this computation for a range of i=0 to j=100. Thus I want to create a number of processes that each call the function "computation" with a subdomain of the total range. Any ideas of how do to this? Is there a better way than using multiprocessing?

    More specific, I want to perform a domain decomposition and I have the following code:

    from multiprocessing import Pool

    class testModule:

    def __init__(self):

    self

    def computation(self, args):

    start, end = args

    print('start: ', start, ' end: ', end)

    testMod = testModule()

    length = 100

    np=4

    p = Pool(processes=np)

    p.map(yes tMod.computation, [(length, startPosition, length//np) for startPosition in range(0, length, length//np)])

    I get an error message mentioning PicklingError. Any ideas what could be the problem here?

    解决方案

    Joblib is designed specifically to wrap around multiprocessing for the purposes of simple parallel looping. I suggest using that instead of grappling with multiprocessing directly.

    The simple case looks something like this:

    from joblib import Parallel, delayed

    Parallel(n_jobs=2)(delayed(foo)(i**2) for i in range(10)) # n_jobs = number of processes

    The syntax is simple once you understand it. We are using generator syntax in which delayed is used to call function foo with its arguments contained in the parentheses that follow.

    In your case, you should either rewrite your for loop with generator syntax, or define another function (i.e. 'worker' function) to perform the operations of a single loop iteration and place that into the generator syntax of a call to Parallel.

    In the later case, you would do something like:

    Parallel(n_jobs=2)(delayed(foo)(parameters) for x in range(i,j))

    where foo is a function you define to handle the body of your for loop. Note that you do not want to append to a list, since Parallel is returning a list anyway.

    展开全文
  • 使用 Pythonfor循环进行简单的 fork/join 并行 快速开始 使用进程/CPU 进行并行迭代: import os from parallelize import parallelize for i in parallelize ( range ( 100 )): print ( os . getpid (), i ...
  • 不进行并行for循环就是单进程迭代计算,demo的计算时间为7.3秒。但是并行后可以看到,后台有多个进程,速度加快至2.6秒。 当单次计算非常快时,由于开销,对多进程或线程的调用可能比顺序计算慢。因此较快速度的...
  • " I'm using Windows 10 64-bit and python 2.7. Please provide your solution by changing my code if you can. Thanks! 解决方案 If you want a more general solution, taking advantage of fully parallel ...
  • 对于某个城市的出租车数据,一天就有33210000条记录,如何将每辆车的数据单独拎出来放到一个专属的文件中呢?思路很简单:就是循环33210000条记录,将每辆车的数据搬运到它该去的文件中。...因此,需要使用并行...
  • 对于某个城市的出租车...因此,需要使用并行进行for循环的技巧: 由于3000万数据放到csv中导致csv打不开,因此我就把一个csv通过split软件将其切分成每份60万,共53个csv。 我原来的思路是读取文件夹,获取由每一个60
  • 写在之前 ...for 循环 在这里我用一个例子来具体解析一下 for 循环: &gt;&gt;&gt; name = 'rocky' &gt;&gt;&gt; for i in name: ... print(i) ... r o c k y 上述的例子...
  • 可以用Parallel来多线程执行循环操作 from joblib import Parallel, delayed import multiprocessing inputs = range(10) def processInput(i): return i * i num_cores = multiprocessing.cpu_count() results...
  • futures[word1].append(executer.submit(new_calculate_similarity,word1,word2))forwordinfutures:# this will block until all calculations have completed for 'word'results=map(Future.result,futures[word])...
  • Is there any possibility to parallelize the following code in python? I was wondering how to convert this code with map and lambda functions..values = (1,2,3,4,5 )def op(x,y):return x+y[(i, j, op(i, j...
  • 通过一个for循环实现两个列表的同时迭代,最终实现列表1循环一次,列表2循环多次的特点
  • python:while循环、for循环

    千次阅读 多人点赞 2018-03-17 22:59:56
    python编程中,python提供了for循环和while循环 while循环: 基本语法为: while 判断条件: 执行语句...... 1、while循环用于循环执行程序,以处理需要重复处理的任务。 2、while循环用于多次判断同一个...
  • 怎么提高pythonfor循环的的效率

    万次阅读 热门讨论 2019-05-08 19:45:01
    就是循环33210000条记录,将每辆车的数据搬运到它该去的文件中。 但是对于3000多万条数据,一个一个循环太消耗时间,我花了2个小时才搬运了60万数据,算算3000万我需要花费100个小时,也就需要4-5天。并且还需要保证...
  • python中,for循环是执行串行计算,换句话说,for循环中的每个阶段一定要在上一个阶段完成之后才能被执行。当循环的代数特别多时,这样会非常耗费时间。 map( ) 这个函数能够将for循环的串行计算改变成并行计算。 ...
  • Python 并行计算

    万次阅读 2016-03-29 20:16:21
    Python 并行计算一、实验说明本实验介绍 Python 并行计算能够用到的工具。1. 环境登录无需密码自动登录,系统用户名shiyanlou2. 环境介绍本课程实验环境使用Spyder。首先打开terminal,然后输入以下命令:git clone ...
  • 使用 MPI for Python 并行化遗传算法

    千次阅读 2017-11-06 09:27:55
    本文中作者使用MPI的Python接口mpi4py来将自己的遗传算法框架GAFT进行多进程并行加速。并对加速效果进行了简单测试。
  • python中while循环和for循环的用法

    千次阅读 2014-12-26 16:01:16
    while语句,提供了编写通用循环的一种方法,而for语句是用来遍历序列对象内的元素,并对每个元素运行一个代码块。break,continue用在循环内,跳出整个循环或者跳出一次循环。 一、while循环 1、一般格式 格式:...
  • python中的for循环

    2015-07-02 14:12:57
    for循环while语句非常灵活,它可以用来在任何条件为真的情况下重复执行一个代码块。 但是如果要为一个集合(序列和其他可迭代对象)的每个元素都执行一个代码块。可以使用for循环。 注意:可迭代对象是指可以按...
  • python中的for循环语句怎么写?Python for 循环语句Python for循环可以遍历任何序列的项目,如一个列表或者一个字符串。for循环的语法格式如下:12for iterating_var in sequence: statements(s)实例1234567891011#!...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 24,589
精华内容 9,835
关键字:

python并行for循环

python 订阅