Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Object Detection running with UMat and/or OpenCL target noticeably slower #117

Open
angryGoat500 opened this issue Aug 15, 2023 · 1 comment

Comments

@angryGoat500
Copy link

angryGoat500 commented Aug 15, 2023

Hey everyone

I have a question regarding the Transparent API / Preferable Target and hope someone can help me understand.

My Object Detection program takes a lot longer to process images when using
net.setPreferableTarget(cv2.dnn.DNN_TARGET_OPENCL_FP16)
or
image = cv2.imread(filePath, cv2.COLOR_BGR2RGB) uMat = cv2.UMat(image)

I've created 4 benchmark programs running sequentually, processing the same 10 .jpg files.

My Baseline is a standard openCV object detection programm, not using the setPreferableTarget or UMat class for images.
The second one sets the setPreferableTarget to cv2.dnn.DNN_TARGET_OPENCL_FP16
The third converts images into UMat objects
The fourth sets the setPreferableTarget to cv2.dnn.DNN_TARGET_OPENCL_FP16 and converts images into UMat objects.

I always measured the full processing time starting before I read the image, ending after drawing the labels (excluding writing the output image or detection log) as well as the model inference time with
t, _ = net.getPerfProfile() infTime = (t / cv2.getTickFrequency())

The collected output is as follows:

Benchmark One Full Processing Time: 2.78063s
Benchmark One Model Inference Time: 1.030843s

Benchmark Two Full Processing Time: 3.2567s
Benchmark Two Model Inference Time: 1.12314s

Benchmark Three Full Processing Time: 12.76886s
Benchmark Three Model Inference Time: 10.83879s

Benchmark Four Full Processing Time: 13.43161047s
Benchmark Four Model Inference Time: 11.27375169s

Is there such a large gap between CPU and GPU execution because of the data transferrel between the processing units? Am I missing something crucial?

If this big gap difference can be explained by the data transfer, is there a possibility to "bundle" my workload to reduce the amount of transferrals?

I can provide the full code for these benchmark Programs if they should be helpful.

Thanks in advance!

@angryGoat500 angryGoat500 changed the title Object Detection running with UMat and/or OpenCL target noticeably slower [Question] Object Detection running with UMat and/or OpenCL target noticeably slower Aug 15, 2023
@doe300
Copy link
Owner

doe300 commented Aug 17, 2023

There are several reasons for why an execution via VC4CL could be slow, e.g.

  • The kernel is not very optimized
  • The kernel is very memory-bound (which is rather slow on the VideoCore IV GPU, esp. read/write-memory)
  • The measurement includes the kernel compilation time, which can take a while
  • ...

As to bundling the workload, I have no clue about OpenCV, but I think as the actual OpenCL client, it would have to be done there...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants