Versions

Python3.6
TensorRT 5.0.2.6

Progress

First of all, here is a great introduction on TensorRT and how it works.

Float32

The official tutorial (sample) on how to accelerate yolv3 can be found in the TensorRT-5.0.2.6/samples/python/yolov3_onnx directory. It is easy to use, however, there might be issues that need to be solved.

  1. yolov3_to_onnx only works with Python2
    Of course a Python2 environment can be set up for this, but if Python3 is preferred, try to add this line after line 51:
remainder = remainder.decode("utf-8")

Int8

In order to use Int8 inference, calibration is needed.
A general example on writing a calibrator can be found here
Add builder.int8_mode=True in function get_engine and initialize the calibrator as shown by the example above, call it int8_calibrator, then add builder.int8_calibrator = int8_calibrator

Issues:

  1. trt.infer.EntropyCalibrator doesn’t exist
    For TensorRT5, the API has changed. Replace it with trt.IInt8EntropyCalibrator
    solves the issue.
  2. int(ptr) ptr can’t be a PyCapsule
    Need to convert ptr from PyCapsule to int. Refer to this link
def convert_capsule_to_int(capsule):
    ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
    ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]
    return ctypes.pythonapi.PyCapsule_GetPointer(capsule, None)

Performance

On GTX1070Ti, input size 608x608,
with a Pytorch implementation, the inference time per image is about 35ms;
with TensorRT float32 inference, the inference time per image is about 28ms;
with TensorRT int8 inference, the inference time per image is about 15 ms

More Issue

With Int8, 100 images are used for calibration. However, the detection result becomes not as good. That is, the detection accuracy is reduced a lot.
Need further experiments on this to see how to improve speed while keeping accuracy.