Table of Contents


This blog post is still a work in progress. If you require further clarifications before the contents are finalized, please get in touch with me here, on LinkedIn, or Twitter.

🚀 Motivation - Edge Deployment

It’s late 2023, everyone seems to be talking about complex and larger models.

Sophisticated models perform well at specific tasks. But they come with the cost of massive computational power.

Typically that’s available in cloud-based environments. Cloud-based environments has limitations, such as latency, bandwidth constraints, and privacy concerns.

This is when edge deployment comes into play.

In simple terms, edge deployment means running a model close to the source of the data. For example running a face recognition model on an IPhone.

Why edge deployment:

  1. Low Latency: Edge devices process data locally. This reduces the time it takes for a model to produce an output.

  2. Privacy: Your data stays on the device. This reduces the risk of data breaches and better compliance with data privacy regulations.

  3. Robustness: Edge devices can function with or without an internet connection. This provides reliability and robustness.


But, there’s a caveat - Edge devices often have limited computational resources.

This is why large models typically go through optimizations before it is deployed on edge devices. In this blog post, we’ll look into ONNX, OpenVINO and TFlite - some of the most popular form of deployement format.

In this post, you’ll learn how to convert PyTorch Image Models (TIMM) into ONNX format, a crucial step in preparing your models for efficient edge deployment.


By the end of this post you’ll learn how to

  • Load any model from TIMM.
  • Convert the model into ONNX, OpenVINO and TFlite format.
  • Optmize the model to improve inference latency.

The codes for this post are on my GitHub repo.

But first, let’s load a PyTorch computer vision model from TIMM.

🖼️ Torch Image Models (TIMM)

TIMM, or Torch Image Models, is a Python library that provides a collection of pre-trained machine learning models specifically designed for computer vision tasks.

To date, TIMM provides more than 1000 state-of-the-art computer vision models trained on various datasets. Many state-of-the-art models are also build using TIMM.

Install TIMM by running:

pip install timm

I’m using version timm==0.9.7 in the post.

Presently there are 1212 models on timm as listed on Hugging Face.

Over a thousand pre-trained models on TIMM.

Over a thousand pre-trained models on TIMM.

Once installed load any model with 2 lines of code:

import timm
model = timm.create_model('convnextv2_base.fcmae_ft_in22k_in1k', 

Now, put model in evaluation mode for inference.

model = model.eval()

Next let’s load an image from the web.

from urllib.request import urlopen
from PIL import Image

img =
Download a random image from the web for inference.

Download a random image from the web for inference.

Next let’s get the model’s specific transforms

data_config =
transforms =**data_config, is_training=False)

With the right transforms we can run an inference on the downloaded image.

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

And view the results

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

>>> tensor([[12.4517,  8.8304,  5.8010,  3.0997,  3.0730]], grad_fn=<TopkBackward0>)

>>> tensor([[968, 967, 969, 960, 504]])

>>> torch.Size([1, 1000])

To view the class names we load the ImageNet classe names with the corresponding index from the inference results.

from imagenet_classes import IMAGENET2012_CLASSES

# Retrieving class names
im_classes = list(IMAGENET2012_CLASSES.values())
class_names = [im_classes[i] for i in top5_class_indices[0]]

>>> ['cup', 'espresso', 'eggnog', 'chocolate sauce, chocolate syrup', 'coffee mug']

Now let’s measure the inference time on CPU.

import time
num_images = 100

with torch.inference_mode():
    start = time.perf_counter()
    for _ in range(num_images):
    end = time.perf_counter()
    time_taken = end - start
    f"PyTorch model on CPU: {time_taken/num_images*1000:.3f} ms per image,\n"
    f"FPS: {num_images/time_taken:.2f}")

>>> PyTorch model on CPU: 109.419 ms per image,
>>> FPS: 9.14

🏆 ONNX (Open Neural Network Exchange)

ONNX is an open format built to represent machine learning models.

ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.

With ONNX, you can train a machine learning model in one framework (e.g. PyTorch) use the trained model in another (e.g. Tensorflow)


💫 In short, ONNX offers two benefits that helps edge deployment:

  • Interoperability - Develop in your preferred framework and not worry about deployment contranints.
  • Hardware access - ONNX compatible runtimes can maximize performance across hardware.

🔁 PyTorch to ONNX

Using TIMM export

python vit_base_patch14_dinov2.lvd142m.onnx --model timm/vit_base_patch14_dinov2.lvd142m --opset 16 --num-classes 0 --reparam --verbose

Using torch export

from timm.utils.model import reparameterize_model
model = reparameterize_model(model)
import torch.onnx
                 torch.rand(1, 3, 518, 518, requires_grad=True),
                 dynamic_axes={'input' : {0 : 'batch_size'},   
                               'output' : {0 : 'batch_size'}}

Simplify the converted ONNX model.

pip install onnxsim -Uq
onnxsim vit_small_patch14_dinov2.lvd142m.onnx vit_small_patch14_dinov2.lvd142m_simplified.onnx

To run an inference in ONNX, install onnxruntime:

pip install onnxruntime
import numpy as np
import onnxruntime as ort
from PIL import Image
from urllib.request import urlopen

# Load ONNX model
session = ort.InferenceSession("vit_small_patch14_dinov2.lvd142m_simplified.onnx")

# Load an image
img =''))
img = img.convert('RGB')
img = img.resize((518, 518))
img_np = np.array(img).astype(np.float32)

# Convert data to the shape the ONNX model expects
input_data = np.transpose(img_np, (2, 0, 1))  # Convert to (C, H, W)
input_data = np.expand_dims(input_data, axis=0)  # Add a batch dimension

input_data.shape # (1, 3, 518, 518)

# Get input name from the model
input_name = session.get_inputs()[0].name

# Perform inference
output =, {input_name: input_data})

# Extract output data (assuming model has a single output)
output_data = output[0]

# (1, 384)

👁️ Visualize Model with Netron

DINOv2 simplified ONNX model.

DINOv2 simplified ONNX model.

📜 PyTorch to Torchscript

import torch
from torch.utils.mobile_optimizer import optimize_for_mobile

example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
optimized_traced_model = optimize_for_mobile(traced_script_module)

🪜 ONNX to OpenVINO

import openvino as ov
ov_model = ov.convert_model('dv2s_redo_simplified.onnx')

###### Option 1: Save to OpenVINO IR:

# save model to OpenVINO IR for later use
ov.save_model(ov_model, 'dv2s_redo_simplified.xml')

###### Option 2: Compile and infer with OpenVINO:

# compile model
compiled_model = ov.compile_model(ov_model)

# prepare input_data
import numpy as np
input_data = np.random.rand(1, 3, 224, 224)

# run inference
result = compiled_model(input_data)

💫 ONNX to TFlite with onnx2tf

onnx2tf is a tool to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC).

💥 PyTorch to OpenVINO

import openvino.torch
model = torch.compile(model, backend='openvino')
# OR
model = torch.compile(model, backend='openvino_ts')


  • openvino - With this backend, Torch FX subgraphs are directly converted to OpenVINO representation without any additional PyTorch based tracing/scripting.
  • openvino_ts - With this backend, Torch FX subgraphs are first traced/scripted with PyTorch Torchscript, and then converted to OpenVINO representation.
import openvino as ov

# Create OpenVINO Core object instance
core = ov.Core()

# Convert model to openvino.runtime.Model object
ov_model = ov.convert_model(model)

# Save openvino.runtime.Model object on disk
ov.save_model(ov_model, f"{MODEL_NAME}_dynamic.xml")

# Load OpenVINO model on device
compiled_model = core.compile_model(ov_model, 'AUTO')

result = compiled_model(input_tensor)[0]

🏁 Wrap Up