-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW - YOLOv8 🚀 Multi-Object Tracking #1429
Comments
@glenn-jocher its supporting .onnx, .trt or openvino.xml weights for tracking? instead of only .pt weight |
@akashAD98 I haven't tested yet but technically track mode supports whatever format predict mode supports. So yes it's supporting .onnx, .trt and other formats. |
@akashAD98 @Laughing-q yes that's right! Tracking supports any predict or segment models in any of the following formats (TF.js is not supported for inference, but all other formats are). Available YOLOv8 export formats are in the table below. You can predict, track or val directly on exported models,
|
@zldrobit I've managed to include metadata additions into all YOLOv8 model formats above except for TF *.pb models. Do you know if this is possible for this format? The metadata is a dictionary here. For directory exports I simply place a ultralytics/ultralytics/yolo/engine/exporter.py Lines 214 to 224 in 30fc4b5
|
if i pass openvino weights, its not support
should i need to do different processing? @glenn-jocher |
@akashAD98 your openvino usage is not aligned with the usage example we've shown (and that's you've pasted). |
yes got it thanks |
@glenn-jocher tracker is not working for custom trained models,
command im using
|
@akashAD98 I just tested with a custom trained model that detect human-head and it works fine me, no errors from ultralytics import YOLO
model = YOLO("best.pt")
results = model.track(
source="test.mp4",
conf=0.5,
iou=0.5,
show=False,
device="CPU",
save=True,
) your error seems like a cv2 issue that related to the |
@Laughing-q i tried to install ultrlytics & defult its using opencv, which OpenCV version i need to install,
|
@akashAD98 I suppose the |
@Laughing-q thanks its working fine for bytetracker, for bot tracker im getting that error also i have one quetions, by defult its taking all files from ultrlytics YOLO for the custom model, I need to pass data.yaml file has my custom names of classes |
Sorry for the late reply. To the best of my knowledge, a GraphDef *.pb file does not include any meta information. It contains only the computation graph (network structure) and the name/weights of each node. There's no official tutorial of TensorFlow to add metadata in GraphDef *.pb files. However, it is possible to use protobuf api |
@akashAD98 well you don't need to pass |
Greetings, I have a question since this is my first time working with object trackers in general : Can YOLOv8 builtin trackers be used for multi-object tracking on video frames read by OpenCV? I am trying to use the YOLOv8 Builtin Tracker for multi-object tracking, but I am unsure if it is possible to use it on video frames read one-by-one from OpenCV using cap.read() instead of a pre-existing full video or video stream. I have searched the documentation and the GitHub repository for YOLOv8, but I could not find any information on this topic. I would appreciate it if you could clarify whether it is possible to use the YOLOv8 Builtin Tracker for multi-object tracking on video frames read by OpenCV. Thank you for your time and attention. I look forward to your responses. |
Hello! Yes, you can use the YOLOv8 Builtin Tracker for multi-object tracking on video frames read by OpenCV. The tracker can be initialized on a single frame and then updated on subsequent frames. Here is a brief overview of how you can do it:
Here is some sample code to get you started: import cv2
from yolov5 import Detector
from yolov5.utils.bbox import xyxy2xywh
from yolov5.utils.tracker import Tracker
# Initialize the detector and the tracker
detector = Detector(weights='yolov5s.pt')
tracker = Tracker(threshold=0.5)
# Open the video capture
cap = cv2.VideoCapture('test.mp4')
while cap.isOpened():
# Read a frame
ret, frame = cap.read()
if not ret:
break
# Pass the frame through the detector and get the detections
detections = detector.detect(frame)
# Pass the detections to the tracker and update the tracks
tracker.update(xyxy2xywh(detections))
# Draw the tracks on the frame
for track in tracker.tracks:
cv2.rectangle |
Hi Glenn , thank you for your quick response. I appreciate your help. However, I noticed that the code you provided is for YOLOv5, while I am specifically looking to use the YOLOv8 Builtin Tracker. Is there an example with YOLOv8 trackers using the |
@mohamedamine99 yes, apologies for the confusion please see https://docs.ultralytics.com/modes/track for Python tracker usage :) |
thanks, I'll check it out |
You're welcome! Let us know if you have any further questions or concerns. We're always here to help. |
@glenn-jocher its possible to use tracker using cv2 ,instead of directly using model.track() which is already mentioned in document. I want to print & get detail of each bounding box,score,label from tracker .using cv2 method. Thanks |
@glenn-jocher thanks same thing exactly i want to use it for yolov8 model & there tracker bot ,bytetravker . If I replace yolov5 with yolov8 weights will it work? |
Yes, if you replace the YOLOv5 weights with YOLOv8 weights, the model architecture should remain the same and the model should work as expected with the YOLOv8 weights. However, keep in mind that the performance and accuracy of the model may differ when using different weights. Also, make sure that the input image size and other hyperparameters are adjusted accordingly when switching between models. |
@RwGrid ah it's simpler than that, you can use model.track(), and if you want to specify a tracker you can use the tracker arg: from ultralytics import YOLO
model = YOLO('yolov8n.pt')
results = model.track(source='video.mp4', tracker='bot_sort.yaml', stream=True)
for result in results:
# process results... |
Hi @luan1412167, Thank you for your question! I'd be happy to provide some insights on the YOLOv8 Nano version. The YOLOv8 Nano version, as the name suggests, is a smaller, lightweight model setup compared to its larger counterparts. The goal is to achieve decent performance on object detection tasks while being computationally efficient and hence suitable for edge devices or environments with limited computational resources. In terms of architecture, YOLOv8 Nano usually consists of a backbone for feature extraction and a head for detecting objects and making predictions. The backbone is typically shallower compared to larger models due to the focus on computational efficiency. The backbone usually consists of a smaller number of convolution layers, often coupled with attention mechanisms or other structure optimizations to maintain a balance between efficiency and performance. The exact count and configuration of the layers depends on the specific setup of YOLOv8 Nano. The head of the network, responsible for object detection, performs the prediction task. The head typically comprises of additional layers that take features extracted from the backbone and generate object bounding box coordinates and class probabilities. Again, as a part of Nano setup, the design aims to be lightweight while keeping the model's predictive performance. Unfortunately, I don't have the exact count of the layers in backbone and head of YOLOv8 Nano as it tends to vary slightly depending on the specific configuration. I hope this general explanation gives you some idea about the architecture of YOLOv8 Nano version. If you have further questions, feel free to ask. |
@glenn-jocher Quite new to Github, so i hope I am right here :) |
Hi @phipsi369, Thanks for your question, I'd be happy to provide more details about the structure of YOLOv8 Nano. Similar to other versions, YOLOv8 Nano consists of a backbone and a head. The backbone is used for feature extraction and the head is responsible for making predictions, including bounding boxes for object detection and class probabilities. For the Nano version, the backbone is designed to be simpler and lighter to run efficiently on devices with lower computational power. Instead of using a Darknet-53 architecture (which is used in larger YOLO versions), it uses a much smaller network. Exact number of layers in the backbone can vary depending on the specific implementation and configuration of YOLOv8 Nano, but it is designed to be minimal. The head of YOLOv8 Nano is responsible for predicting the bounding boxes and class probabilities, similar to other YOLO versions. It is composed of few layers that take the output from the backbone, process it, and provide the final object detection results. The model's prediction is based on the output from the head, which is processed from the features it received from the backbone. The backbone's purpose is to convert the input image into a rich set of features, and the head's purpose is to translate these features into detectable objects and their attributes. The architecture of YOLOv8 Nano is designed to strike a good balance between inference speed and accuracy, allowing it to perform object detection reasonably well even on low-power devices. I hope this provides some insight into the structure of YOLOv8 Nano. Please feel free to ask if there's anything else you'd like to know! |
Thanks @glenn-jocher it somewhat helps. However I have no problems with detections. I am using the model 8L and tracking the movement and direction of cars. For that I am using ByteTrack so far. The detection works just fine, with some little tweaks to adjust. My problem lies in the right allocation of IDs by the Tracker. It often results in 2 or more cars being given the same ID, after they move through the picture one after the other. Since I am evaluating each car based on it's ID, it produces counting errors when the ID gets regiven. However I can't find a good adjustment for the ByteTrack.yaml and was wondering if you guys had an Idea what to change. I already lowered the track_buffer. However I think the problem lies in the Matching tracks, since every car is on the same road, their movement in the picture is nearly identical. Any Idea how to work around this? So far its giving me nearly 20% errors. |
Hi @phipsi369, The YOLOv8 Nano is a more compact version of the YOLOv8 model designed to be computationally efficient while maintaining a good balance between speed and accuracy. The backbone of YOLOv8 Nano typically consists of a modified version of the Darknet-53 architecture. However, it uses fewer layers to reduce the computational complexity and memory usage. The number of layers in the backbone can vary depending on the specific configuration, but it's generally significantly less than the 53 layers used in the full Darknet-53 architecture. On the other hand, the head of the YOLOv8 Nano model, like the original YOLOv8 model, is responsible for predicting the object bounding boxes and class probabilities. It consists of convolutional, upsampling, and linear layers. It also utilizes multiple detection layers at different scales to increase the accuracy of detection for various object sizes. The model's prediction layers are usually located within the head. These layers use the feature maps produced by the backbone and apply a set of transformations to generate the bounding box coordinates and class probability scores. Please note that the precise model structure, including the number and type of layers in the backbone and head, can vary depending on the specific implementation and configuration of the YOLOv8 Nano model. Thank you for your understanding and let me know if you need more information. |
Hello, I am a student in Japan. I am a beginner and my knowledge is limited, so sorry if my English is not correct.
Detection
However, the tracking does not work. I am running the following source code and the detection is done successfully, but the tracking is not. I would like to know how to solve this problem. Thank you in advance for your help.
I respect you guys for being able to create such a great system. |
@20157m can you show the error message? |
Thank you for your reply. Tracking : I ask because I am not sure if there is a mistake in the source code or in the preparation of the tracking. |
When I have ID assignment errors, meaning more than one object gets the same ID( never happens with both objects in the frame). So lets say Object 1 is in the picture, assigned ID: 1. Then object 1 leaves the picture and about 5 seconds later Object 2 enters the frame and gets assigned ID: 1 aswell. I tried lowering the Track_Buffer in the botsort.yaml aswell as lowering the new_track_thresh, but it doesnt change the example. At one Point there is three Objects in the picture within 20 seconds all being assigned the same ID. Someone pls help :( |
Hi @20157m, Happy to provide some insight into the structure of the YOLOv8 Nano version! The exact architecture can depend on the particular configuration you're using, but generally, YOLO Nano has a lightweight and efficient structure suitable for deployments on systems with limited computational resources, like mobile or edge devices. The backbone, or feature extractor, in YOLO Nano typically has a significantly reduced number of layers compared to the larger YOLO versions. This can sometimes be a combination of convolutional and shortcut (residual) layers, similar to a miniaturized version of Darknet but with fewer layers. The head, which is responsible for making the final object detections, consists of a few additional layers. These layers would include a series of convolutional layers, up-sampling and concatenation operations to combine feature maps, and final detection layers. These final layers are where the network outputs bounding box coordinates, objectness scores, and class probabilities. Unfortunately, without an explicit model configuration at hand, it's not feasible to provide specific details about the exact count of layers, or which layers are specifically responsible for predictions. Remember that the exact structure will depend on the specific configuration of YOLOv8 Nano that you are using, and I would recommend reviewing the configuration file for your model to understand its structure in detail. I hope this gives you some general understanding of the architecture of YOLO Nano! Let me know if you have more questions. |
Hi @glenn-jocher, Thank you for your time. I will explain my recent situation. Sorry if I have said anything strange.
Using these data, we trained on the yolov8l.pt model in the same way as the code in the previous question(#1429 (comment)). I got the last.pt as a result and tried to see if I could actually use it to extract the fruit in the video. The result was not very accurate, but we were able to extract the fruits well, and we were able to have it judge the degree of ripeness in 5 levels. Next we went to #1429 (comment) and wanted to track with the original model.Then I ran into a problem.
If I run the code as above, it is not tracked and the result is saved with only labels without IDs, etc., as it is with model.predict().Is my approach to tracking wrong?Must other learning methods be used when tracking? I apologize if my understanding has not been up to par and I have not conveyed the necessary information.I am concerned that I have failed to present a clear model. |
Hi @20157m, Great question! The YOLO Nano version is a smaller and more efficient version of the YOLO model with fewer layers, which was developed keeping edge devices with lower computational power in mind. The backbone of YOLO Nano is much smaller than the standard model. It is a modified version of the DarkNet architecture. Instead of using the DarkNet-53 like the original YOLO, YOLO Nano has fewer layers to keep the model more lightweight, while still effectively extracting features from the input image. The head of the YOLO Nano includes the layers that are responsible for object detection, predicting the bounding boxes and their respective class probabilities. Similar to the backbone, the head of YOLO Nano is also smaller and efficient compared to larger YOLO models. As for the specific layers responsible for predictions, it isn't very straightforward because the entire model (including both backbone and head) works together to make predictions. Higher layers in the network handle broader features, while lower layers handle details. Each part of the structure contributes to the prediction process — the backbone extracts features from the images, which the head then uses to detect objects and calculate bounding boxes and confidences. Remember that the exact number of the layers in the backbone and head can vary depending on the specific version and configuration of YOLO Nano you're using. I hope this helps clarify your understanding of the YOLO Nano architecture! Let me know if you have any other questions. |
Hi @breadrone, I'm glad you found the previous explanation helpful. Let me provide some details about the YOLO Nano version. The YOLO Nano is a lightweight, efficient version of the YOLO model, specifically designed for edge computing devices with limited computational resources. Similar to its counterparts, YOLO Nano also has a backbone and a head. The backbone is responsible for extracting features from the input image, and the head utilizes these features to make the final object detection predictions. In terms of structure, the backbone of YOLO Nano is generally composed of several convolutional layers. It's important to note that the exact number of layers can vary depending on the specific implementation of YOLO Nano you are using. Typically, YOLO Nano employs a lot fewer layers in its backbone compared to the standard YOLO models in order to reduce computational complexity. The head of YOLO Nano is responsible for predicting the bounding boxes and class probabilities. It contains additional layers that process the features extracted by the backbone to generate the final detections. These layers typically include a combination of convolutional, upsampling, and linear layers. Unfortunately, without the specific implementation details, I cannot provide the exact count of layers and their functionalities. I would recommend taking a look at the architecture diagram of the YOLO Nano version you are using for a thorough understanding of its structure and specific layer responsibilities. I hope this provides a clearer picture of the YOLO Nano structure! If you have any more questions, feel free to ask. |
Hello @phipsi369, I'm glad to hear that you found the previous explanation helpful. Let's dive a bit into the YOLOv8 Nano version. The YOLOv8 Nano model has a simplified and lightweight design to support computer vision tasks on devices with limited computational resources. As with the standard YOLOv8, the YOLOv8 Nano includes a backbone and a head. The backbone is responsible for feature extraction and is generally simpler than the one used in the full-sized model. Although the exact number of layers can vary, the backbone usually comprises of a series of reduced convolutional layers. For the head of the YOLOv8 Nano model, it is designed to detect and classify objects in the input. Similar to the backbone, the head is streamlined and smaller in comparison to the full-sized model. About the specific layers responsible for predictions, the output layer at the very end of the network produces the final bounding boxes and class probabilities. Each bounding box includes coordinates (x, y, width, height), an objectness score, and class probabilities. The exact configuration of layers in both the backbone and head can vary depending on different factors, including the characteristics and requirements of your specific application. Remember that while the YOLOv8 Nano is smaller and faster, it generally won't perform as well as larger models in terms of accuracy due to its reduced complexity. However, it may provide a more efficient solution for scenarios where resources are constrained or where speed is of greater importance. I hope that helps! Let me know if you need more information or have any follow-up questions. |
Hello @20157m, YOLOv8 Nano, like all YOLO models, is a variant of the YOLO family optimised for smaller devices, hence the "Nano" specification. The architecture of YOLOv8 Nano, similar to other YOLO models, can be broken down into two main components: the backbone and the head. The backbone is responsible for feature extraction. It consists of multiple convolutional layers aimed at extracting features from the input images at various spatial resolutions. While exact numbers can vary depending upon the specific implementation or customization, typically, the backbone could consist of tens of layers, bringing the total in the range of 45-53 in some cases. The head is the part responsible for prediction. It processes the features extracted by the backbone to perform the final object detection. The head includes additional layers that use the convolutional features to predict the bounding boxes and the class probabilities for every detected object in the image. There might be fewer layers in the head as compared to the backbone, and also includes upsampling layers for feature map resolution recovery, and further convolutional layers directly responsible for bounding box and class predictions. The specific functionality of layers could include standard convolutional layers for feature extraction, pooling layers for downsampling the feature maps, normalization layers such as batch normalization for accelerating training and reducing overfitting, and the final detection layers which make the predictions based on these extracted features. That said, the specifics of the layers within the model architecture can range depending on factors related to the exact implementation of YOLOv8 Nano or even potential training requirements and constraints. I hope this provides a good high-level overview of the YOLOv8 Nano's architecture! Feel free to ask if you have more questions or need further clarification on any points. |
lol lisa ))) <3 |
Hi @bharath5673, Thank you for your question! The YOLOv8 Nano, like its counterparts, follows a similar architectural layout, comprising a backbone and a detection head. The backbone of YOLOv8 Nano is responsible for feature extraction. It consists of fewer layers compared to other YOLOv8 versions, making it highly suitable for resource-limited devices. The number of layers can vary depending on the specific implementation of YOLOv8 Nano, but typically it has fewer convolutional layers compared to other versions like YOLOv8 or YOLOv8-large. The detection head of YOLOv8 Nano is where the actual prediction happens. It takes feature maps from the backbone, and through its layers, generates the bounding boxes, objectness scores, and class predictions. As with the backbone, the number of layers depends on the specific implementation but typically would include convolution layers and upsampling layers. To visualize the exact architecture and to see the specific layers that are involved in prediction, you can check the configuration (.cfg) file for YOLOv8 Nano. This file provides a detailed layer-by-layer structure of the model, so you can see the specifics of the backbone and the detection head, including types of layers, their order, and their hyperparameters. I hope this gives you a better understanding of YOLOv8 Nano's structure. Don't hesitate to reach out if you have more questions on this! |
Hello, I'm kinda new to training computer vision models. |
Thank you for your question. YOLO Nano is a smaller and more efficient version of the YOLO models, specifically designed for edge devices with limited computational resources. The YOLO Nano architecture consists of a backbone and a head, similar to other YOLO models. The backbone is responsible for feature extraction. It's a smaller and streamlined version compared to larger YOLO models and is designed to be very efficient. The exact number of layers can vary, but it typically contains fewer convolutional layers to reduce the model's complexity and computational requirements. The head of the model is responsible for detecting objects and predicting bounding boxes and class probabilities based on the feature maps produced by the backbone. The head also includes several layers, however, it's less complex than the larger YOLO models. The layers in the head include convolutional layers for bounding box regression and class prediction, as well as additional layers for other tasks like anchor box assignment and non-maximum suppression. Regarding the specific layers responsible for predictions, it's typically the final layers in the head. These layers take the feature maps from the backbone and generate the final predictions for the bounding box coordinates and class probabilities. The YOLO Nano architecture is designed to strike a balance between accuracy and efficiency, making it suitable for edge devices and real-time applications where resource constraints are a major consideration. I hope this provides a basic understanding of the YOLO Nano model structure. Do let me know if you need further clarification. |
hi, |
|
hi |
Hi, Thanks |
Hi @glenn-jocher , Thank you |
I have reproduced the result of yolov8s-pose, and the experimental data is very different from yours, with map50 dropping 20% points. My code and data are exactly the same as what you described, and the operation steps are also strictly followed by yours. Therefore, I may have a problem with hyperparameter setting. |
Thank you @glenn-jocher for such a great open source work. Because, it is quite necessary for using custom training. |
How to use tensorboard to check bn_weight in yolov8. |
i want to do yolov8 tracking using .onnx model .can anyone help me |
Hi! |
I am getting a similar error: Code:
Please help!! |
YOLOv8 Multi-Object Tracking
Object tracking is a task that involves identifying the location and class of objects, then assigning a unique ID to that detection in video streams.
The output of tracker is the same as detection with an added object ID.
Available Trackers
The following tracking algorithms have been implemented and can be enabled by passing
tracker=tracker_type.yaml
botsort.yaml
bytetrack.yaml
The default tracker is BoT-SORT.
Tracking
Use a trained YOLOv8n/YOLOv8n-seg model to run tracker on video streams.
Python
CLI
As in the above usage, we support both the detection and segmentation models for tracking and the only thing you need to do is loading the corresponding (detection or segmentation) model.
Configuration
Tracking
Tracking shares the configuration with predict, i.e
conf
,iou
,show
. More configurations please refer to predict page.!!! example ""
Python
CLI
yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" conf=0.3, iou=0.5 show
Tracker
We also support using a modified tracker config file, just copy a config file i.e.
custom_tracker.yaml
from ultralytics/tracker/cfg and modify any configurations(expect thetracker_type
) you need to.Python
CLI
Please refer to ultralytics/tracker/cfg page.
The text was updated successfully, but these errors were encountered: