FastDeploy AI模型快速部署
FastDeploy是一款全场景、易用灵活、极致高效的AI推理部署工具。提供开箱即用的云边端部署体验, 支持超过150+ Text, Vision, Speech和跨模态模型,
并实现端到端的推理性能优化。包括图像分类、物体检测、图像分割、人脸检测、人脸识别、关键点检测、抠图、OCR、NLP、TTS等任务,满足开发者多场景、多硬件、
多平台的产业部署需求。
PaddlePaddle FastDeploy官网开源网址:https://github.com/PaddlePaddle/FastDeploy,所有更新以官方公布为准。
Jetson部署库编译
FastDeploy当前在Jetson仅支持ONNX Runtime CPU和TensorRT GPU/Paddle Inference三种后端推理。
C++ SDK编译安装
编译需满足
gcc/g++ >= 5.4(推荐8.2)
cmake >= 3.10.0
jetpack >= 4.6.1
如果需要集成Paddle Inference后端,在Paddle Inference预编译库页面根据开发环境选择对应的Jetpack C++包下载,并解压。
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy
mkdir build && cd build
cmake .. -DBUILD_ON_JETSON=ON \
-DENABLE_VISION=ON \
-DENABLE_PADDLE_BACKEND=ON \ # 可选项,如若不需要Paddle Inference后端,可关闭
-DPADDLEINFERENCE_DIRECTORY=/Download/paddle_inference_jetson \
-DCMAKE_INSTALL_PREFIX=${PWD}/installed_fastdeploy
make -j8
make install
编译完成后,即在CMAKE_INSTALL_PREFIX指定的目录下生成C++推理库。
C++推理
#include "fastdeploy/runtime.h"
namespace fd = fastdeploy;
int main(int argc, char* argv[]) {
std::string model_file = "mobilenetv2/inference.pdmodel";
std::string params_file = "mobilenetv2/inference.pdiparams";
// setup option
fd::RuntimeOption runtime_option;
runtime_option.SetModelPath(model_file, params_file, fd::ModelFormat::PADDLE);
runtime_option.UseOrtBackend();
runtime_option.SetCpuThreadNum(12);
// init runtime
std::unique_ptr<fd::Runtime> runtime =
std::unique_ptr<fd::Runtime>(new fd::Runtime());
if (!runtime->Init(runtime_option)) {
std::cerr << "--- Init FastDeploy Runitme Failed! "
<< "\n--- Model: " << model_file << std::endl;
return -1;
} else {
std::cout << "--- Init FastDeploy Runitme Done! "
<< "\n--- Model: " << model_file << std::endl;
}
// init input tensor shape
fd::TensorInfo info = runtime->GetInputInfo(0);
info.shape = {1, 3, 224, 224};
std::vector<fd::FDTensor> input_tensors(1);
std::vector<fd::FDTensor> output_tensors(1);
std::vector<float> inputs_data;
inputs_data.resize(1 * 3 * 224 * 224);
for (size_t i = 0; i < inputs_data.size(); ++i) {
inputs_data[i] = std::rand() % 1000 / 1000.0f;
}
input_tensors[0].SetExternalData({1, 3, 224, 224}, fd::FDDataType::FP32, inputs_data.data());
//get input name
input_tensors[0].name = info.name;
runtime->Infer(input_tensors, &output_tensors);
output_tensors[0].PrintInfo();
return 0;
}
Python编译安装
编译过程同样需要满足
gcc/g++ >= 5.4(推荐8.2)
cmake >= 3.10.0
jetpack >= 4.6.1
python >= 3.6
Python打包依赖wheel,编译前请先执行pip install wheel
如果需要集成Paddle Inference后端,在Paddle Inference预编译库页面根据开发环境选择对应的Jetpack C++包下载,并解压。
所有编译选项通过环境变量导入
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy/python
export BUILD_ON_JETSON=ON
export ENABLE_VISION=ON
# ENABLE_PADDLE_BACKEND & PADDLEINFERENCE_DIRECTORY为可选项
export ENABLE_PADDLE_BACKEND=ON
export PADDLEINFERENCE_DIRECTORY=/Download/paddle_inference_jetson
python setup.py build
python setup.py bdist_wheel
编译完成即会在FastDeploy/python/dist目录下生成编译后的wheel包,直接pip install即可。
编译过程中,如若修改编译参数,为避免带来缓存影响,可删除FastDeploy/python目录下的build和.setuptools-cmake-build两个子目录后再重新编译。
Python推理
import fastdeploy as fd
model_url = "https://bj.bcebos.com/fastdeploy/models/mobilenetv2.tgz"
fd.download_and_decompress(model_url, path=".")
option = fd.RuntimeOption()
option.set_model_path("mobilenetv2/inference.pdmodel",
"mobilenetv2/inference.pdiparams")
# **** CPU 配置 ****
option.use_cpu()
option.use_ort_backend()
option.set_cpu_thread_num(12)
# 初始化构造runtime
runtime = fd.Runtime(option)
# 获取模型输入名
input_name = runtime.get_input_info(0).name
# 构造随机数据进行推理
results = runtime.infer({
input_name: np.random.rand(1, 3, 224, 224).astype("float32")
})
print(results[0].shape)