info
This blog post is still a work in progress. If you require further clarifications before the contents are finalized, please get in touch with me here, on LinkedIn, or Twitter.
It’s late 2023, everyone seems to be talking about complex and larger models.
Sophisticated models perform well at specific tasks. But they come with the cost of massive computational power.
Typically that’s available in cloud-based environments. Cloud-based environments has limitations, such as latency, bandwidth constraints, and privacy concerns.
This is when edge deployment comes into play.
In simple terms, edge deployment means running a model close to the source of the data. For example running a face recognition model on an IPhone.
Why edge deployment:
Low Latency: Edge devices process data locally. This reduces the time it takes for a model to produce an output.
Privacy: Your data stays on the device. This reduces the risk of data breaches and better compliance with data privacy regulations.
Robustness: Edge devices can function with or without an internet connection. This provides reliability and robustness.
note
But, there’s a caveat - Edge devices often have limited computational resources.
This is why large models typically go through optimizations before it is deployed on edge devices. In this blog post, we’ll look into ONNX, OpenVINO and TFlite - some of the most popular form of deployement format.
In this post, you’ll learn how to convert PyTorch Image Models (TIMM) into ONNX format, a crucial step in preparing your models for efficient edge deployment.
tip
By the end of this post you’ll learn how to
The codes for this post are on my GitHub repo.
But first, let’s load a PyTorch computer vision model from TIMM.
TIMM, or Torch Image Models, is a Python library that provides a collection of pre-trained machine learning models specifically designed for computer vision tasks.
To date, TIMM provides more than 1000 state-of-the-art computer vision models trained on various datasets. Many state-of-the-art models are also build using TIMM.
Install TIMM by running:
pip install timm
I’m using version timm==0.9.7
in the post.
Presently there are 1212 models on timm as listed on Hugging Face.
Over a thousand pre-trained models on TIMM.
Once installed load any model with 2 lines of code:
import timm
model = timm.create_model('convnextv2_base.fcmae_ft_in22k_in1k',
pretrained=True)
Now, put model in evaluation mode for inference.
model = model.eval()
Next let’s load an image from the web.
from urllib.request import urlopen
from PIL import Image
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
Download a random image from the web for inference.
Next let’s get the model’s specific transforms
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
With the right transforms we can run an inference on the downloaded image.
output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
And view the results
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
top5_probabilities
>>> tensor([[12.4517, 8.8304, 5.8010, 3.0997, 3.0730]], grad_fn=<TopkBackward0>)
top5_class_indices
>>> tensor([[968, 967, 969, 960, 504]])
output.shape
>>> torch.Size([1, 1000])
To view the class names we load the ImageNet classe names with the corresponding index from the inference results.
from imagenet_classes import IMAGENET2012_CLASSES
# Retrieving class names
im_classes = list(IMAGENET2012_CLASSES.values())
class_names = [im_classes[i] for i in top5_class_indices[0]]
class_names
>>> ['cup', 'espresso', 'eggnog', 'chocolate sauce, chocolate syrup', 'coffee mug']
Now let’s measure the inference time on CPU.
import time
num_images = 100
with torch.inference_mode():
start = time.perf_counter()
for _ in range(num_images):
model(transforms(img).unsqueeze(0))
end = time.perf_counter()
time_taken = end - start
print(
f"PyTorch model on CPU: {time_taken/num_images*1000:.3f} ms per image,\n"
f"FPS: {num_images/time_taken:.2f}")
>>> PyTorch model on CPU: 109.419 ms per image,
>>> FPS: 9.14
ONNX is an open format built to represent machine learning models.
ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
With ONNX, you can train a machine learning model in one framework (e.g. PyTorch) use the trained model in another (e.g. Tensorflow)
note
💫 In short, ONNX offers two benefits that helps edge deployment:
Using TIMM export
python onnx_export.py vit_base_patch14_dinov2.lvd142m.onnx --model timm/vit_base_patch14_dinov2.lvd142m --opset 16 --num-classes 0 --reparam --verbose
Using torch export
from timm.utils.model import reparameterize_model
model = reparameterize_model(model)
import torch.onnx
torch.onnx.export(model,
torch.rand(1, 3, 518, 518, requires_grad=True),
"vit_small_patch14_dinov2.lvd142m.onnx",
export_params=True,
opset_version=16,
do_constant_folding=True,
input_names=['input'],
output_names=['output'],
dynamic_axes={'input' : {0 : 'batch_size'},
'output' : {0 : 'batch_size'}}
)
Simplify the converted ONNX model.
pip install onnxsim -Uq
onnxsim vit_small_patch14_dinov2.lvd142m.onnx vit_small_patch14_dinov2.lvd142m_simplified.onnx
To run an inference in ONNX, install onnxruntime
:
pip install onnxruntime
import numpy as np
import onnxruntime as ort
from PIL import Image
from urllib.request import urlopen
# Load ONNX model
session = ort.InferenceSession("vit_small_patch14_dinov2.lvd142m_simplified.onnx")
# Load an image
img = Image.open(urlopen('https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'))
img = img.convert('RGB')
img = img.resize((518, 518))
img_np = np.array(img).astype(np.float32)
# Convert data to the shape the ONNX model expects
input_data = np.transpose(img_np, (2, 0, 1)) # Convert to (C, H, W)
input_data = np.expand_dims(input_data, axis=0) # Add a batch dimension
input_data.shape # (1, 3, 518, 518)
# Get input name from the model
input_name = session.get_inputs()[0].name
# Perform inference
output = session.run(None, {input_name: input_data})
# Extract output data (assuming model has a single output)
output_data = output[0]
output_data.shape
# (1, 384)
Netron.app
DINOv2 simplified ONNX model.
import torch
from torch.utils.mobile_optimizer import optimize_for_mobile
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
optimized_traced_model = optimize_for_mobile(traced_script_module)
optimized_traced_model._save_for_lite_interpreter("torchscript_edgenext_xx_small.pt")
import openvino as ov
ov_model = ov.convert_model('dv2s_redo_simplified.onnx')
###### Option 1: Save to OpenVINO IR:
# save model to OpenVINO IR for later use
ov.save_model(ov_model, 'dv2s_redo_simplified.xml')
###### Option 2: Compile and infer with OpenVINO:
# compile model
compiled_model = ov.compile_model(ov_model)
# prepare input_data
import numpy as np
input_data = np.random.rand(1, 3, 224, 224)
# run inference
result = compiled_model(input_data)
onnx2tf is a tool to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC).
import openvino.torch
model = torch.compile(model, backend='openvino')
# OR
model = torch.compile(model, backend='openvino_ts')
note
openvino
-
With this backend, Torch FX subgraphs are directly converted to OpenVINO representation without any additional PyTorch based tracing/scripting.openvino_ts
-
With this backend, Torch FX subgraphs are first traced/scripted with PyTorch Torchscript, and then converted to OpenVINO representation.import openvino as ov
# Create OpenVINO Core object instance
core = ov.Core()
# Convert model to openvino.runtime.Model object
ov_model = ov.convert_model(model)
MODEL_NAME = "DINOV2S"
# Save openvino.runtime.Model object on disk
ov.save_model(ov_model, f"{MODEL_NAME}_dynamic.xml")
# Load OpenVINO model on device
compiled_model = core.compile_model(ov_model, 'AUTO')
input_tensor=transforms(img).unsqueeze(0)
result = compiled_model(input_tensor)[0]