上手SNPE(3)API详解
发表于 2022-03-03 18:24:42

在上一篇 使用888的HTP推理inceptionV3 中,我们测试了inceptionv3在888芯片手机上的CPU、GPU、HTP上的推理,结果显示HTP的推理速度是GPU的7倍多。除了速度快之外,HTP还有一大优势就是功耗低,这点对于手机应用也非常重要。

 下面我将简单介绍如何在APP中通过SNPE API使用HTP,加速模型的同时减少功耗。

本文包含以下部分:

  1. SNPE Native C++ API 的介绍
  2. 在 888 手机的HTP上运行示例代码
  3. 调试API的介绍

SNPE API介绍

<path to snpe sdk>/doc/html/cplus_plus_tutorial.html
<path to snpe sdk>\examples\NativeCpp\SampleCode\jni\main.cpp

简单来说,可以按照以下步骤来调用SNPE

  1. 检查可用的runtime(可选项);
  2. 加载网络DLC模型;
  3. 配置SNPE选项,新建SNPE instance;
  4. 加载模型的输入数据;
  5. 执行网络推理, 得到输出数据;
  6. 卸载SNPE

 SNPE API Basic Call Flow

对应在sample code里面的代码如下:

// <path to snpe sdk>\examples\NativeCpp\SampleCode\jni\main.cpp
static zdl::DlSystem::Runtime_t runtime = checkRuntime();
std::unique_ptr<zdl::DlContainer::IDlContainer> container = loadContainerFromFile(dlc);
std::unique_ptr<zdl::SNPE::SNPE> snpe = setBuilderOptions(container, runtime, useUserSuppliedBuffers);
std::unique_ptr<zdl::DlSystem::ITensor> inputTensor = loadInputTensor(snpe, fileLine); // ITensor
snpe->execute(inputTensor.get(), outputTensorMap);// ITensor
snpe.reset();

1. 检查可用的runtime(可选项)

runtime指的是CPU,GPU,DSP,HTA,HTP。

对于不同的手机,包含的runtime类型不一样。

【注意】:开发者仅可以使用DSP/HTP上的unsignedPD,具体可以参考Hexagon DSP SDK里面的unsigned PD,所以对于HTP的runtime需要加上UNSIGNEDPD_CHECK选项。

// 检查DSP/HTP是不是可用。
zdl::DlSystem::Runtime_t checkRuntime()
{
    static zdl::DlSystem::Version_t Version = zdl::SNPE::SNPEFactory::getLibraryVersion();
    static zdl::DlSystem::Runtime_t Runtime;
    std::cout << "SNPE Version: " << Version.asString().c_str() << std::endl; //Print Version number
    // 这里加上了zdl::DlSystem::RuntimeCheckOption_t::UNSIGNEDPD_CHECK,表示使用unsignedPD
    // <path to snpe sdk>/doc/html/group__c__plus__plus__apis.html#ga960452d40eef91090973a17a438eaabd
    if (zdl::SNPE::SNPEFactory::isRuntimeAvailable(zdl::DlSystem::Runtime_t::DSP, zdl::DlSystem::RuntimeCheckOption_t::UNSIGNEDPD_CHECK)) {
        Runtime = zdl::DlSystem::Runtime_t::GPU;
    } else {
        Runtime = zdl::DlSystem::Runtime_t::CPU;
    }
    return Runtime;
}

2. 加载模型

加载DLC格式的模型文件,用于新建SNPE instance。

//containerPath 是存放DLC文件的路径
std::unique_ptr<zdl::DlContainer::IDlContainer> loadContainerFromFile(std::string containerPath)
{
    std::unique_ptr<zdl::DlContainer::IDlContainer> container;
    container = zdl::DlContainer::IDlContainer::open(containerPath);
    return container;
}

3. 新建SNPE instance

(1)设置 platformConfig(可选项)

【注意】如果开发者使用DSP/HTP,必须在这里选择 unsignedPD:O

zdl::DlSystem::PlatformConfig platformConfig;
std::string PlatformOptions = "unsignedPD:ON";
// check platform options
if (PlatformOptions.length() > 0) {
    bool setSuccess = platformConfig.setPlatformOptions(PlatformOptions);
    bool isValid = platformConfig.isOptionsValid();
    std::cout << "PlatformOptions (" << PlatformOptions << ") set " << (setSuccess ? "successful" : "failed")
                        << " config option is " << (isValid ? "valid" : "invalid") << std::endl;
    if (!setSuccess || !isValid) {
            return EXIT_FAILURE;
    }
}

(2)配置选项,新建SNPE

std::unique_ptr<zdl::SNPE::SNPE> setBuilderOptions(std::unique_ptr<zdl::DlContainer::IDlContainer> & container,
                                                   zdl::DlSystem::Runtime_t runtime,
                                                   zdl::DlSystem::RuntimeList runtimeList,
                                                   bool useUserSuppliedBuffers,
                                                   zdl::DlSystem::PlatformConfig platformConfig,
                                                   bool useCaching)
{
    std::unique_ptr<zdl::SNPE::SNPE> snpe;
    zdl::SNPE::SNPEBuilder snpeBuilder(container.get());
    if(runtimeList.empty())
    {
        runtimeList.add(runtime);
    }
    snpe = snpeBuilder.setOutputLayers({})
       .setRuntimeProcessorOrder(runtimeList)
       .setUseUserSuppliedBuffers(useUserSuppliedBuffers)
       .setPlatformConfig(platformConfig) // 这个就是(1)中设置的选项
       .setInitCacheMode(useCaching)// 这个选项加快初始化速度
       .build();
    return snpe;
}

3 加载模型的输入数据

有两种加载输入数据的方式ITensors 和 User Buffers.

User Buffers方式的好处是SNPE直接映射到用户创建的数据buffer,避免将数据拷贝到ITensor,从而减少了数据拷贝的时间开销

(1)ITensor使用方式可以参考:

  • <path to snpe sdk>\examples\NativeCpp\SampleCode\jni\LoadInputTensor.cpp
  • <path to snpe sdk>\examples\NativeCpp\SampleCode\jni\SaveOutputTensor.cpp

(2)User Buffers使用方式可以参考:

  • <path to snpe sdk>\examples\NativeCpp\SampleCode\jni\CreateUserBuffer.cpp

4 执行SNPE推理

// ITensor 模式
snpe->execute(inputTensor.get(), outputTensorMap)
// User Buffers 模式
snpe->execute(inputMap, outputMap);

5 卸载SNPE

snpe.reset();


运行示例代码

依然延续上一篇的docker container环境,我们这里将修改、编译SNPE的 Native C++ Sample Code, 在HTP上推理inceptionV3.

1 安装NDK,设置编译环境变量

root@c633e07fbd33:/opt# wget https://dl.google.com/android/repository/android-ndk-r19b-linux-x86_64.zip
root@c633e07fbd33:/opt# unzip -q  android-ndk-r19b-linux-x86_64.zip
root@c633e07fbd33:/opt# rm android-ndk-r19b-linux-x86_64.zip


root@c633e07fbd33:/workspace/tutor/inceptionv3# export ANDROID_NDK_ROOT=/opt/android-ndk-r19b/
root@c633e07fbd33:/workspace/tutor/inceptionv3# cp $SNPE_ROOT/examples/NativeCpp/SampleCode . -r

//runtme not present
haydn:/data/local/tmp/incpv3 $ snpe-sample -d inception_v3_htp.dlc -i target_raw_list.txt -r dsp
SNPE Version: 1.52.0.2724
Selected runtime not present. Falling back to CPU.

2 编译 Sample Code

(1)ndk-build 编译得到snpe-sample

//从SNPE SDK 拷贝SampleCode到工作目录
root@c633e07fbd33:/workspace/tutor/inceptionv3# cp $SNPE_ROOT/examples/NativeCpp/SampleCode . -r
root@c633e07fbd33:/workspace/tutor/inceptionv3# cd SampleCode/jni/
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# export PATH=$PATH:$ANDROID_NDK_ROOT
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# ndk-build

(2)在手机上运行snpe-sample

这里我们复用上一篇中的push到手机中的库文件和模型文件

root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# adb push ../obj/local/arm64-v8a/snpe-sample /data/local/tmp/incpv3/arm64/bin
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# adb shell
haydn:/ $ cd  /data/local/tmp/incpv3/
haydn:/data/local/tmp/incpv3 $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/incpv3/arm64/lib
haydn:/data/local/tmp/incpv3 $ export PATH=$PATH:/data/local/tmp/incpv3/arm64/bin
haydn:/data/local/tmp/incpv3 $ export ADSP_LIBRARY_PATH="/data/local/tmp/incpv3/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
//测试snpe-sample
haydn:/data/local/tmp/incpv3 $ snpe-sample -h

DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using the SNPE C++ API.


REQUIRED ARGUMENTS:
-------------------
  -d  <FILE>   Path to the DL container containing the network.
  -i  <FILE>   Path to a file listing the inputs for the network.
  -o  <PATH>   Path to directory to store output results.

OPTIONAL ARGUMENTS:
-------------------
  -b  <TYPE>   Type of buffers to use [USERBUFFER_FLOAT, USERBUFFER_TF8, ITENSOR, USERBUFFER_TF16] (ITENSOR is default).
  -r  <RUNTIME> The runtime to be used [gpu, dsp, aip, cpu] (cpu is default).
  -u  <VAL,VAL> Path to UDO package with registration library for UDOs.
                Optionally, user can provide multiple packages as a comma-separated list.
  -z  <NUMBER>  The maximum number that resizable dimensions can grow into.
                Used as a hint to create UserBuffers for models with dynamic sized outputs. Should be a positive integer and is not applicable when using ITensor.
  -s  <TYPE>   Source of user buffers to use [GLBUFFER, CPUBUFFER] (CPUBUFFER is default).
  -c           Enable init caching to accelerate the initialization process of SNPE. Defaults to disable.
  -l  <VAL,VAL,VAL> Specifies the order of precedence for runtime e.g  cpu_float32, dsp_fixed8_tf etc. Valid values are:-
                    cpu_float32 (Snapdragon CPU)       = Data & Math: float 32bit
                    gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
                    dsp_fixed8_tf (Hexagon DSP)        = Data & Math: 8bit fixed point Tensorflow style format
                    gpu_float16 (Adreno GPU)           = Data: float 16bit Math: float 16bit
                    cpu (Snapdragon CPU)               = Same as cpu_float32
                    gpu (Adreno GPU)                   = Same as gpu_float32_16_hybrid
                    dsp (Hexagon DSP)                  = Same as dsp_fixed8_tf
//运行snpe-sample
haydn:/data/local/tmp/incpv3 $ snpe-sample -d inception_v3_htp.dlc -i target_raw_list.txt -r dsp
SNPE Version: 1.52.0.2724
Selected runtime not present. Falling back to CPU.

运行结果显示 Selected runtime not present. Falling back to CPU. 这是因为默认的SampleCode 里面没有设置为unsignedPD。

(3)设置unsigned PD

Code change

diff --git a/examples/NativeCpp/SampleCode/jni/CheckRuntime.cpp b/examples/NativeCpp/SampleCode/jni/CheckRuntime.cpp
index 82c9c10..8ba67ce 100755
--- a/examples/NativeCpp/SampleCode/jni/CheckRuntime.cpp
+++ b/examples/NativeCpp/SampleCode/jni/CheckRuntime.cpp
@@ -23,7 +23,7 @@ zdl::DlSystem::Runtime_t checkRuntime(zdl::DlSystem::Runtime_t runtime)
 
     std::cout << "SNPE Version: " << Version.asString().c_str() << std::endl; //Print Version number
 
-    if (!zdl::SNPE::SNPEFactory::isRuntimeAvailable(runtime))
+    if (!zdl::SNPE::SNPEFactory::isRuntimeAvailable(runtime, zdl::DlSystem::RuntimeCheckOption_t::UNSIGNEDPD_CHECK))
     {
         std::cerr << "Selected runtime not present. Falling back to CPU." << std::endl;
         runtime = zdl::DlSystem::Runtime_t::CPU;
diff --git a/examples/NativeCpp/SampleCode/jni/main.cpp b/examples/NativeCpp/SampleCode/jni/main.cpp
index 6ec2f95..8ad06bc 100755
--- a/examples/NativeCpp/SampleCode/jni/main.cpp
+++ b/examples/NativeCpp/SampleCode/jni/main.cpp
@@ -324,6 +324,18 @@ int main(int argc, char** argv)
         return EXIT_FAILURE;
     }
 
+    std::string PlatformOptions = "unsignedPD:ON";
+
+	// check platform options
+	if (PlatformOptions.length() > 0) {
+	  bool setSuccess = platformConfig.setPlatformOptions(PlatformOptions);
+	  bool isValid = platformConfig.isOptionsValid();
+	  std::cout << "PlatformOptions (" << PlatformOptions << ") set " << (setSuccess ? "successful" : "failed")
+				<< " config option is " << (isValid ? "valid" : "invalid") << std::endl;
+	  if (!setSuccess || !isValid) {
+		 return EXIT_FAILURE;
+	  }
+	}
     snpe = setBuilderOptions(container, runtime, runtimeList, useUserSuppliedBuffers, platformConfig, usingInitCaching);
     if (snpe == nullptr)
     {

再次编译和运行snpe-sample,成功运行到DSP/HTP上。

root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# ndk-build
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# adb push ../obj/local/arm64-v8a/snpe-sample /data/local/tmp/incpv3/arm64/bin

root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# adb shell
haydn:/ $ cd  /data/local/tmp/incpv3/
haydn:/data/local/tmp/incpv3 $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/incpv3/arm64/lib
haydn:/data/local/tmp/incpv3 $ export PATH=$PATH:/data/local/tmp/incpv3/arm64/bin
haydn:/data/local/tmp/incpv3 $ export ADSP_LIBRARY_PATH="/data/local/tmp/incpv3/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
haydn:/data/local/tmp/incpv3 $ snpe-sample -d inception_v3_htp.dlc -i target_raw_list.txt -r dsp
SNPE Version: 1.52.0.2724
PlatformOptions (unsignedPD:ON) set successful config option is valid
Batch size for the container is 1
Processing DNN Input: cropped/notice_sign.raw
Processing DNN Input: cropped/trash_bin.raw
Processing DNN Input: cropped/plastic_cup.raw
Processing DNN Input: cropped/chairs.raw

调试API介绍

1 输出中间层

调试模式下,会输出每一层的结果。

  • setDebugMode()
  • SNPEBuilder& setDebugMode(bool debugMode)

2. 收集每一层运行时间

设置profilingLevel到DETAILED,会把每一层的运行时间记录到diag.log里面。

  • setProfilingLevel()
  • SNPEBuilder& setProfilingLevel(zdl::DlSystem::ProfilingLevel_t profilingLevel)

总结

到此为止,我们完成了

  • 介绍SNPE API的调用流程
  • 修改SampleCode,完成Inceptionv3在的DSP/HTP 的unsignedPD推理
  • 介绍调试API

其他相关内容:

第一篇【上手SNPE(1)推理InceptionV3】

第二篇【上手SNPE(2)使用手机推理inceptionV3】

作者:Wenhao

CSDN官方微信
扫描二维码,向CSDN吐槽
微信号:CSDNnews
微博关注
【免责声明:CSDN本栏目发布信息,目的在于传播更多信息,丰富网络文化,稿件仅代表作者个人观点,与CSDN无关。其原创性以及文中陈述文字和文字内容未经本网证实,对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本网不做任何保证或者承诺,请读者仅作参考,并请自行核实相关内容。您若对该稿件有任何怀疑或质疑,请立即与CSDN联系,我们将迅速给您回应并做处理。】