精华内容
下载资源
问答
  • PyCUDA

    2020-12-09 00:48:23
    <div><p>Would be nice to have PyCUDA added to <code>defaults</code>. Given it links against CUDA, this may be the only way to distribute it ATM. Though if there is a way to get it into conda-forge as ...
  • Pycuda

    2021-02-05 18:57:43
    前言 以下参考及本文许多地方都有错误,指出立马修改 参考1 参考2 参考3 step01 安装pycuda库 缺少cuda,然后去必应上找的cuda 10.0 然后缺少一个编译用的cl.exe,去下载Microsoft Visual Studio/...import pycuda.autoi

    前言

    以下参考及本文许多地方都有错误,指出立马修改
    参考1
    参考2
    参考3

    step01

    安装pycuda库
    缺少cuda,然后去必应上找的cuda 10.0
    然后缺少一个编译用的cl.exe,去下载Microsoft Visual Studio/2017,一定不能是最新的
    最后配置一下cl.exe的环境变量,跑起来一个基础的程序就可以了

    from time import *
    import numpy as np
    from pycuda import gpuarray
    import pycuda.autoinit
    import os
    import pycuda.driver as drv
    from pycuda.compiler import SourceModule
    
    # 刚开始这里不加会报错,就加上了
    _path = r"C:/Program Files (x86)/Microsoft Visual Studio/2017/Enterprise/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64"
    if os.system("cl.exe"):
       os.environ['PATH'] += ';' + _path
    if os.system("cl.exe"):
       raise RuntimeError("cl.exe still not found, path probably incorrect")
    
    def simple_speed_test():
        host_data = np.float32(np.random.random(50000000))
    
        t1 = time()
        host_data_2x =  host_data * np.float32(2)
        t2 = time()
        print(f'total time to compute on CPU: {t2 - t1}')
    
        device_data = gpuarray.to_gpu(host_data)
    
        t1 = time()
        device_data_2x =  device_data * np.float32(2)
        t2 = time()
    
        from_device = device_data_2x.get()
    
        print(f'total time to compute on GPU: {t2 - t1}')
        print(f'Is the host computation the same as the GPU computation? : {np.allclose(from_device, host_data_2x)}')
    
    simple_speed_test()
    

    step02

    学习CUDA C编程,个人感觉比较重要的一点是索引选取和block以及thread的设计
    自带的两个核:
    在这里插入图片描述
    在这里插入图片描述
    还有一个用于自定义的SourceModel
    在这里插入图片描述

    # 编写核函数
    mod = SourceModule("""
    	__global__ void CAL_RMS(float *a ,float *b)
    		{
    	    int idx = threadIdx.x + blockIdx.x*blockDim.x;
    	    a[idx] = (a[idx]-b[idx])*(a[idx]-b[idx])*96.04;
    		}
    	""")    
    # 创建数据
    acc_data = np.array(acc_data, dtype=np.float32)
    acc_mean = np.empty_like(acc_data)
    acc_mean[:] = np.float32(acc_data.mean())
    # 分配内存
    a_gpu = cuda.mem_alloc(acc_data.nbytes)
    b_gpu = cuda.mem_alloc(acc_mean.nbytes)
    # 赋值
    cuda.memcpy_htod(a_gpu, acc_data)
    cuda.memcpy_htod(b_gpu, acc_mean)
    # 调用
    func = mod.get_function("CAL_RMS")
    func(a_gpu, b_gpu, grid=(int(acc_data.shape[0]/250),), block=(250, 1, 1))
    # 取结果值
    a_doubled = np.empty_like(acc_data)
    cuda.memcpy_dtoh(a_doubled, a_gpu)
    

    这里就涉及到一个很重要的概念,就是
    int idx = threadIdx.x + blockIdx.x*blockDim.x;
    pycuda里线程号的选取方式,这个其实是和
    func(a_gpu, b_gpu, grid=(int(acc_data.shape[0]/250),), block=(250, 1, 1))
    密切相关的,前提是了解GPU线程,线程块概念
    在这里插入图片描述
    线程号的计算方式(不一定都对,至少情况4是没问题的):

    # 1.使用N个线程块,每一个线程块只有一个线程
    dim3 dimGrid(N);
    dim3 dimBlock(1);
    threadId = blockIdx.x;
    
    # 2.使用M×N个线程块,每个线程块1个线程
    dim3 dimGrid(M,N);
    dim3 dimBlock(1);
    blockIdx.x #取值0到M-1
    blcokIdx.y #取值0到N-1
    pos = blockIdx.y * gridDim.x + blockIdx.x; #其中gridDim.x等于M
    
    # 3.使用一个线程块,该线程具有N个线程
    dim3 dimGrid(1);
    dim3 dimBlock(N);
    threadId = threadIdx.x;
    
    # 4.使用M个线程块,每个线程块内含有N个线程
    dim3 dimGrid(M);
    dim3 dimBlock(N);
    threadId = threadIdx.x + blcokIdx.x*blockDim.x;
    
    # 5.使用M×N的二维线程块,每一个线程块具有P×Q个线程
    dim3 dimGrid(M, N);
    dim3 dimBlock(P, Q);
    threadId.x = blockIdx.x*blockDim.x+threadIdx.x;
    threadId.y = blockIdx.y*blockDim.y+threadIdx.y;
    

    step03

    最后就是多理解多敲吧,找到合适可以修改的地方进行修改,GPU并行这块就靠pycuda了,另外在CPU的并行计算上有三个适用于pandas的库

    # 1.用parallel_apply代替apply,仅限linux和Mac系统
    from pandarallel import pandarallel
    pandarallel.initialize()
    
    # 2.针对apply
    import swifter
    data.swifter.apply(lambda)
    
    # 3.直接替代pandas,适代码而用
    import modin.pandas as pd
    

    另外就是用多线程多进程来处理了

    import threadpool
    thread_data=list(np.array(km_stake))[1:]
    # print(thread_data)
    pool = threadpool.ThreadPool(8)
    requests = threadpool.makeRequests(apply_work, thread_data)
    [pool.putRequest(req) for req in requests]
    pool.wait()
    
    展开全文
  • PyCUDA Documentation

    2020-07-27 17:12:07
    PyCUDA gives you easy, Pythonic access toNvidia’sCUDAparallel computation API. Several wrappers of the CUDA API already exist–so why the need for PyCUDA? Object cleanup tied to lifet...

    目录

    Contents

    Indices and tables

    PyCUDA

    Quick search


    PyCUDA gives you easy, Pythonic access to Nvidia’s CUDA parallel computation API. Several wrappers of the CUDA API already exist–so why the need for PyCUDA?

    • Object cleanup tied to lifetime of objects. This idiom, often called RAII in C++, makes it much easier to write correct, leak- and crash-free code. PyCUDA knows about dependencies, too, so (for example) it won’t detach from a context before all memory allocated in it is also freed.

    • Convenience. Abstractions like pycuda.compiler.SourceModule and pycuda.gpuarray.GPUArray make CUDA programming even more convenient than with Nvidia’s C-based runtime.

    • Completeness. PyCUDA puts the full power of CUDA’s driver API at your disposal, if you wish.

    • Automatic Error Checking. All CUDA errors are automatically translated into Python exceptions.

    • Speed. PyCUDA’s base layer is written in C++, so all the niceties above are virtually free.

    • Helpful Documentation. You’re looking at it. ;)

    Here’s an example, to give you an impression:

    import pycuda.autoinit
    import pycuda.driver as drv
    import numpy
    
    from pycuda.compiler import SourceModule
    mod = SourceModule("""
    __global__ void multiply_them(float *dest, float *a, float *b)
    {
      const int i = threadIdx.x;
      dest[i] = a[i] * b[i];
    }
    """)
    
    multiply_them = mod.get_function("multiply_them")
    
    a = numpy.random.randn(400).astype(numpy.float32)
    b = numpy.random.randn(400).astype(numpy.float32)
    
    dest = numpy.zeros_like(a)
    multiply_them(
            drv.Out(dest), drv.In(a), drv.In(b),
            block=(400,1,1), grid=(1,1))
    
    print dest-a*b

    (This example is examples/hello_gpu.py in the PyCUDA source distribution.)

    On the surface, this program will print a screenful of zeros. Behind the scenes, a lot more interesting stuff is going on:

    • PyCUDA has compiled the CUDA source code and uploaded it to the card.

      Note

      This code doesn’t have to be a constant–you can easily have Python generate the code you want to compile. See Metaprogramming.

    • PyCUDA’s numpy interaction code has automatically allocated space on the device, copied the numpy arrays a and b over, launched a 400x1x1 single-block grid, and copied dest back.

      Note that you can just as well keep your data on the card between kernel invocations–no need to copy data all the time.

    • See how there’s no cleanup code in the example? That’s not because we were lazy and just skipped it. It simply isn’t needed. PyCUDA will automatically infer what cleanup is necessary and do it for you.

    Curious? Let’s get started.

    Contents

    Note that this guide will not explain CUDA programming and technology. Please refer to Nvidia’s programming documentation for that.

    PyCUDA also has its own web site, where you can find updates, new versions, documentation, and support.

    Indices and tables

    PyCUDA


    Quick search

    ©2008, Andreas Kloeckner. | Powered by Sphinx 3.1.2 & Alabaster 0.7.12 | Page source

    展开全文
  • Install pycuda

    2019-01-07 13:59:25
    Install pycuda how to install pip install pycuda Tutorial https://documen.tician.de/pycuda/index.html Example https://wiki.tiker.net/PyCuda demo import pycuda.autoinit import pycuda.driver as drv i...

    Install pycuda
    how to install

    pip install pycuda
    

    Tutorial

    https://documen.tician.de/pycuda/index.html
    

    Example

    https://wiki.tiker.net/PyCuda
    

    demo

    import pycuda.autoinit
    import pycuda.driver as drv
    import numpy
    
    from pycuda.compiler import SourceModule
    mod = SourceModule("""
    __global__ void multiply_them(float *dest, float *a, float *b)
    {
      const int i = threadIdx.x;
      dest[i] = a[i] * b[i];
    }
    """)
    
    multiply_them = mod.get_function("multiply_them")
    
    a = numpy.random.randn(400).astype(numpy.float32)
    b = numpy.random.randn(400).astype(numpy.float32)
    
    dest = numpy.zeros_like(a)
    multiply_them(
            drv.Out(dest), drv.In(a), drv.In(b),
            block=(400,1,1), grid=(1,1))
    
    print(dest-a*b)
    
    展开全文
  • PyCUDA lets you access Nvidia's CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist-so what's so special about PyCUDA? Object cleanup tied to lifetime of ...
  • pycuda安装

    2020-10-31 11:46:54
    下载: https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycuda

     

    下载:

    https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycuda

     

    import pycuda
    
    print(pycuda.VERSION)

     

    展开全文
  • 关于pycuda 主页: : 软件包许可证:麻省理工学院 原料许可证: 摘要:适用于CUDA的Python包装器 开发: : 文档: : PyCUDA允许您通过CUDA并行计算接口从Python访问GPU。 当前构建状态 蔚蓝 变体 状态 ...
  • Pycuda import error

    2020-12-27 20:10:22
    import pycuda.driver ModuleNotFoundError: No module named 'pycuda' </code></pre> <p>Is pycuda not installed through the conda environment (.yml) file ?</p><p>该提问来源于开源项目:...
  • 很多时候,我们希望对tensor有一些自定义的操作,一种实现方式就是使用pycuda。本文以实现两个tensor的加法为例,讲解如何实现pycuda与pytorch交互。 1. pycuda的使用方式 首先看下pycuda文档对pycuda的定义: ...
  • 在anaconda中安装pycuda,版本2019.1.2: 运行demo的时候出现: import pycuda.gpuarray as gpuarray #导入GPU端的数组 File "D:\zahid\Anaconda3\envs\pt-gpu\lib\site-packages\pycuda\gpuarray.py", line 4, ...
  • Contributing PyCUDA routines

    2020-11-27 02:15:47
    <p>I stumbled across this project looking for some PyCUDA routines that operate on matrices per-row or per-column. It seems you have a bunch of handy routines for this, which is awesome, e.g. row-wise...
  • pycuda使用简介

    2020-05-03 09:43:37
    环境 win10, visual studio 2019, pycuda 2019.02, 在你使用PyCuda之前,要先用import命令来导入并初始化一下。 import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule ...
  • problem about pycuda

    2020-12-27 23:05:31
    <div><p>I ran install_pycuda.sh under the ssd folder,but a problem occurred. I do not know how to solve this problem,can you help me ?: `pycuda-2019.1.2/test/test_gpuarray.py <p><strong><em> I ...
  • broadcast_pycuda-源码

    2021-02-10 10:30:55
    PyCUDA GPUArray启用广播 产品特点 扩展pycuda.elementwise.Elementwise类以启用广播 扩展pycuda.gpuarray.GPUArray以使用修改后的Elementwise 它是如何工作的 您可以在找到此示例 从src导入ElementwiseKernel并像...
  • ubuntu安装pyCUDA

    千次阅读 2019-09-06 20:10:41
    0. 写在前面 安装环境:ubuntu18.04(16和18差不多,但是18太爽了)和python3(具体版本忘了,应该是... 首先用pip3安装一般服务器会超时,这个时候也可以用清华源或者其他国内源安装,标准命令是"pip3 install pycuda...
  • python pycuda接口

    2019-08-12 21:39:21
    import pycuda.compiler import SourceModule import pycuda.autoinit import pycuda.driver as drv mod = SourceModule("“用C语言编写的CUDA程序”") func = mod.get_function(“func”) 在python语言中调用cuda...
  • Cannot pip install pycuda

    2020-11-25 20:48:05
    <div><p>I have an issue installing pyCUDA using "pip install pycuda." I am running Mac OS X Mavericks 10.9. I installed CUDA 6.5 SDK for Mac 10.9. Xcode 5.1.1 and the developer tools are also ...
  • conda安装pycuda

    2020-05-19 11:30:09
    可能会出现一些问题,最大的原因还是conda的版本高了,最新的conda版本为3.7,而pycuda支持到3.6,所以,需要创建一个低版本的虚拟环境: 见conda创建虚拟环境 使用pip 命令直接安装即可: pip install pycuda ...
  • pycuda-2019.1.2.tar.gz

    2021-06-07 21:59:56
    pycuda安装包
  • jetson系列安装pycuda

    2021-05-17 14:35:57
    使用python进行并行计算编程时必须安装pycuda,在TX2使用命令“pip install pycuda”安装pycuda时一直安装失败,错误如下: 于是将pycuda下载下来进行安装,成功了! pycuda下载链接:链接: ...
  • pycuda 用于加速python,前提是你的计算机上安装了英伟达显卡
  • <div><p>Hello, I installed pycuda on my machine successfully, after installing visual studio c++ 14.0 However even if I can call nvcc from shell pycuda can not locate Cuda or nvcc. <p>Is there...
  • TX2安装pycuda 

    2021-01-13 11:38:07
    TX2安装pycuda 1.Download latest pycuda https://pypi.org/project/pycuda/#files 2.cd pycuda-VERSION python configure.py --cuda-root=/where/ever/you/installed/cuda 3.python3 configure.py --cuda-root...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 715
精华内容 286
关键字:

pycuda