Source code interpretation of DeepSort

The article is a bit long...

Code address: https://github.com/ZQPei/deep_sort_pytorch

A tracker is a class responsible for operations on multiple tracks, including predictions and updates.

 self.tracker.predict()
self.tracker.update(detections)

The tracker prediction stage is to predict each track, including

Kalman forecast
track age +1
time_since_update+1, this variable is used to record the time when the track was last updated

code show as below:

  def predict(self, kf):
        """Propagate the state distribution to the current time step using a
        Kalman filter prediction step.

        Parameters
        ----------
        kf : kalman_filter.KalmanFilter
            The Kalman filter.

        """
        self.mean, self.covariance = kf.predict(self.mean, self.covariance)
        self.age += 1
        self.time_since_update += 1

A tracker update is an update to multiple tracks:

Matching of track and det
track update
Distance metrics update

code show as below:

  def update(self, detections):
        """Perform measurement update and track management.

        Parameters
        ----------
        detections : List[deep_sort.detection.Detection]
            A list of detections at the current time step.

        """
        # Run matching cascade.
        matches, unmatched_tracks, unmatched_detections = \
            self._match(detections)
        print("matches:",matches, "unmatched_tracks:",unmatched_tracks, "unmatched_detections:", unmatched_detections)

        # Update track set.
        for track_idx, detection_idx in matches:
            self.tracks[track_idx].update(
                self.kf, detections[detection_idx])
        for track_idx in unmatched_tracks:
            self.tracks[track_idx].mark_missed()
        for detection_idx in unmatched_detections:
            self._initiate_track(detections[detection_idx])
        self.tracks = [t for t in self.tracks if not t.is_deleted()]

        # Update distance metric.
        active_targets = [t.track_id for t in self.tracks if t.is_confirmed()]
        features, targets = [], []
        for track in self.tracks:
            if not track.is_confirmed():
                continue
            features += track.features
            print("1 features",track.track_id,np.array(features).shape)
            targets += [track.track_id for _ in track.features]
            print("1 targets_id",track.track_id,targets)
            track.features = []
        self.metric.partial_fit(
            np.asarray(features), np.asarray(targets), active_targets)

first frame

The test results are as follows:

 det [array([307,  97, 105, 345]), array([546, 151,  72, 207]), array([215, 154,  59, 184]), array([400, 181,  45, 126])]

After the detection result is obtained, the track predict stage is entered, but the first frame has no track, so there is no predict result.

 #code1
for track in self.tracks:
            track.predict(self.kf)

Then enter the track update stage, first match the detection results

 matches, unmatched_tracks, unmatched_detections = \
            self._match(detections)

In the matching, the track is first divided into confirmed_track and unconfirmed_track,

   confirmed_tracks = [
            i for i, t in enumerate(self.tracks) if t.is_confirmed()]
        unconfirmed_tracks = [
            i for i, t in enumerate(self.tracks) if not t.is_confirmed()]

obviously

 confirmed_track: [] unconfirmed_track: []

Concatenate matching for confirmed_track

         matches_a, unmatched_tracks_a, unmatched_detections = \
            linear_assignment.matching_cascade(
                gated_metric, self.metric.matching_threshold, self.max_age,
                self.tracks, detections, confirmed_tracks)

Obviously, there are no matching tracks, and no unmatched tracks, only unmatched detections.

 matches_a [] unmatched_track_a [] unmatched_detections  [0, 1, 2, 3]

Create a new track for all detections:

    for detection_idx in unmatched_detections:
            self._initiate_track(detections[detection_idx])

The initialization code is as follows:

 def _initiate_track(self, detection):
        mean, covariance = self.kf.initiate(detection.to_xyah())
        self.tracks.append(Track(
            mean, covariance, self._next_id, self.n_init, self.max_age,
            detection.feature))
        self._next_id += 1

Initialize a track through the Track class, self._next_id += 1, because after creating a track, there is one more id.

The properties initialized by each track are as follows:

 self.mean = mean
self.covariance = covariance
self.track_id = track_id
self.hits = 1
self.age = 1
self.time_since_update = 0

self.state = TrackState.Tentative
self.features = []
if feature is not None:
    self.features.append(feature)

self._n_init = n_init
self._max_age = max_age

Initialized track, state is Tentative, age=1, time_since_update = 0, features=[].

The state of the track within 3 frames of silence is tentative. After 3 frames it is conformed. If 30 frames are not updated, it is deleted

second frame

Test results:

 det [array([227, 152,  52, 189]), array([546, 153,  66, 203]), array([ 35,  52, 114, 466]), array([339, 130,  92, 278]), array([273, 134,  90, 268])]

Because the first frame gets 4 tracks, each track enters the predict stage, and Kalman prediction is performed, age and time_since_update are +1 respectively

   def predict(self, kf):
        """Propagate the state distribution to the current time step using a
        Kalman filter prediction step.

        Parameters
        ----------
        kf : kalman_filter.KalmanFilter
            The Kalman filter.

        """
        self.mean, self.covariance = kf.predict(self.mean, self.covariance)
        self.age += 1
        self.time_since_update += 1

At this point, the age and time_since_update of each track are:

 track update age 2 time_since_update 1
track update age 2 time_since_update 1
track update age 2 time_since_update 1
track update age 2 time_since_update 1

Enter the track update stage after prediction

Match the predicted detection result with the previously obtained track, first divide the previous track into confirmed_tracks and unconfirmed_tracks, the result is:

 confirmed_track [] unconfirmed_track [0, 1, 2, 3]

Since confirmed_track is empty, the cascade matching result is:

 matches_a [] unmatched_track_a [] unmatched_detections  [0, 1, 2, 3,4]

Next, the unconfirmed_track and the track with time_since_update=1 in the unmatched_track_a of the cascading matching result (the previous frame is updated) form a candidate track.

    iou_track_candidates = unconfirmed_tracks + [
            k for k in unmatched_tracks_a if
            self.tracks[k].time_since_update == 1]

        unmatched_tracks_a = [
            k for k in unmatched_tracks_a if
            self.tracks[k].time_since_update != 1]

The result of candidate track and unmatched_track_a is:

 iou_track_candidates [0, 1, 2, 3],unmatched_track_a []

Perform iou matching on candidate tracks and unmatched detections

  matches_b, unmatched_tracks_b, unmatched_detections = \
            linear_assignment.min_cost_matching(
                iou_matching.iou_cost, self.max_iou_distance, self.tracks,
                detections, iou_track_candidates, unmatched_detections)

The result of the IOU match is:

 matches_b [(0, 0), (1, 1), (2, 2), (3, 3)] unmatches_track_b [] unmatched_detections [4]

Finally, the results are merged, the track matched by the cascading match with the track that matches the iou is merged into the final matching result, and the track with time_since_update!=1 in the cascading matching and the track not matched by the iou are merged into the final unmatched track. It can be seen that the confirmed track that has been updated in the previous frame will perform cascade matching and iou matching, and the confirmed track that has not been updated in the previous frame will directly become the unmatched track. Probably, there is an updated track in the previous frame. The probability that the current frame will continue to update will be greater.

  matches = matches_a + matches_b
 unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b))

The final result is:

 matches: [(0, 0), (1, 1), (2, 2), (3, 3)] unmatched_tracks: [] unmatched_detections: [4]

After matching, there will be three results, which are matched detection, unmatched track and unmatched detection frame.

Next, enter the track data update stage

For matching results, execute

  for track_idx, detection_idx in matches:
            self.tracks[track_idx].update(
                self.kf, detections[detection_idx])

Update each track

Kalman
Detect bounding box features, each track will store a series of features for feature matching
hits
time_since_update is set to 0
track status, judge that the status can be set to confirmed

    def update(self, kf, detection):
        """Perform Kalman filter measurement update step and update the feature
        cache.

        Parameters
        ----------
        kf : kalman_filter.KalmanFilter
            The Kalman filter.
        detection : Detection
            The associated detection.

        """
        self.mean, self.covariance = kf.update(
            self.mean, self.covariance, detection.to_xyah())
        self.features.append(detection.feature)

        self.hits += 1
        self.time_since_update = 0
        if self.state == TrackState.Tentative and self.hits >= self._n_init:
            self.state = TrackState.Confirmed

At this point, all tracks can be matched, and their time_since_update is 0,.

For the unmatched track, mark its status. If the current track status is tentative, the status is updated to deleted. If it has not been updated for too long, time_since_update>max_age, the status will also be updated to deleted.

 for track_idx in unmatched_tracks:
            self.tracks[track_idx].mark_missed()

  def mark_missed(self):
        """Mark this track as missed (no association at the current time step).
        """
        if self.state == TrackState.Tentative:
            self.state = TrackState.Deleted
        elif self.time_since_update > self._max_age:
            self.state = TrackState.Deleted

For no matching detections, create a new track

    for detection_idx in unmatched_detections:
            self._initiate_track(detections[detection_idx])

Then check all tracks and delete the track in the deleted state.

  self.tracks = [t for t in self.tracks if not t.is_deleted()]

third frame

The test result is:

 [array([307, 105, 108, 325]), array([547, 148,  70, 211]), array([216, 151,  59, 190]), array([402, 183,  43, 124]), array([ 35,  87,  70, 376])]

The tracking process is similar to the previous frame. The detection results here can match the previous track. The track age and time_since_update are

 track update age 3 time_since_update 1
track update age 3 time_since_update 1
track update age 3 time_since_update 1
track update age 3 time_since_update 1
track update age 2 time_since_update 1

After matching, the update of the track set will update the status of some tracks to confirmed.

Let's look directly at the fourth frame.

fourth frame

Test results:

 [array([318, 119, 105, 301]), array([545, 146,  71, 215]), array([216, 151,  59, 192]), array([ 30,  75,  82, 398]), array([403, 185,  41, 121])]

After the detection results are obtained, the prediction phase is entered, and the track updates the Kalman prediction, age and time_since_update.

 track update age 4 time_since_update 1
track update age 4 time_since_update 1
track update age 4 time_since_update 1
track update age 4 time_since_update 1
track update age 3 time_since_update 1

After the prediction is completed, enter the track update stage

The first is that the detection result matches the track. In the matching, the track is divided into confirmed_track and unconfirmed_track. The results are as follows:

 confirmed_t [0, 1, 2, 3] unconfirmed [4]

Because the 4th det is detected only in the 2nd frame, the status is still unconfirmed.

Then perform cascade matching on the confirmed track

The first is to create an index on dets and confirmed_tracks

  if track_indices is None:
        track_indices = list(range(len(tracks)))
    if detection_indices is None:
        detection_indices = list(range(len(detections)))

The result is:

 track_indices [0, 1, 2, 3] detection_indices [0, 1, 2, 3, 4]

When level=0, the corresponding time_since_update in the track_indices_l index is 1, and then the matching result of matches_l is obtained. Of course, when level=1, the corresponding time_since_update in the track_indices_l index is 2, and then the matching result is obtained again and merged with the results between , and so on..., that is, first match the track that has been updated recently, from near to far..., which ensures the priority of the most recently updated track.

  unmatched_detections = detection_indices
    matches = []
    for level in range(cascade_depth):
        if len(unmatched_detections) == 0:  # No detections left
            break

        track_indices_l = [
            k for k in track_indices
            if tracks[k].time_since_update == 1 + level
        ]
        if len(track_indices_l) == 0:  # Nothing to match at this level
            continue

        matches_l, _, unmatched_detections = \
            min_cost_matching(
                distance_metric, max_distance, tracks, detections,
                track_indices_l, unmatched_detections)
        matches += matches_l
    unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches))

After cascading matching, the result is:

 matches_a [(0, 0), (1, 1), (2, 2), (3, 4)] unmatched_track_a [] unmatched_detections  [3]

One unmatched det is left.

The unconfirmed_tracks and the unmatched tracks in the cascade matching and time_since_update = 1 constitute candidate tracks.

 iou_track_candidates [4]

The candidate tracks are IOU matched with the unmatched det, and the results are as follows:

 matches_b [(4, 3)] unmatches_track_b [] unmatched_detections []

The final result is as follows:

 matches: [(0, 0), (1, 1), (2, 2), (3, 4), (4, 3)] unmatched_tracks: [] unmatched_detections: []

After matching, update the feature of the current frame dets to map(trackid->feature).

   active_targets = [t.track_id for t in self.tracks if t.is_confirmed()]
        features, targets = [], []
        for track in self.tracks:
            if not track.is_confirmed():
                continue
            features += track.features
            targets += [track.track_id for _ in track.features]
            track.features = []
        self.metric.partial_fit(
            np.asarray(features), np.asarray(targets), active_targets)

  def partial_fit(self, features, targets, active_targets):
        for feature, target in zip(features, targets):
            self.samples.setdefault(target, []).append(feature)
            if self.budget is not None:
                self.samples[target] = self.samples[target][-self.budget:]
        
        self.samples = {k: self.samples[k] for k in active_targets}

The whole deepsort process is like this, let's take a look at the more detailed problems.

IOU match

How to get the cost matrix?

Initialize the cost matrix, matrix (i, j) represents the cost of track i and det j. Then calculate the IOU of bbx and det predicted by Kalman filter, cost=1-IOU. But if the track has not been updated for more than one frame (including), then the cost will be set to be very large, that is, INFTY (1e+5).

 def iou_cost(tracks, detections, track_indices=None,
             detection_indices=None):
    
   if track_indices is None:
        track_indices = np.arange(len(tracks))
    if detection_indices is None:
        detection_indices = np.arange(len(detections))

    cost_matrix = np.zeros((len(track_indices), len(detection_indices)))
    for row, track_idx in enumerate(track_indices):
        if tracks[track_idx].time_since_update > 1:
            cost_matrix[row, :] = linear_assignment.INFTY_COST
            continue

        bbox = tracks[track_idx].to_tlwh()
        candidates = np.asarray([detections[i].tlwh for i in detection_indices])
        cost_matrix[row, :] = 1. - iou(bbox, candidates)
    return cost_matrix

After getting the cost matrix, if the element is greater than max_distance, the element will be set to max_distance + 1e-5

 cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5

The second frame cost matrix is:

 [[0.04281178 1.         1.         0.96899767 1.        ]
 [1.         0.03566279 1.         1.         1.        ]
 [1.         1.         0.04389799 1.         1.        ]
 [0.95802783 1.         1.         0.08525083 1.        ]]
 
 #处理后
 [[0.04281178 0.70001    0.70001    0.70001    0.70001   ]
 [0.70001    0.03566279 0.70001    0.70001    0.70001   ]
 [0.70001    0.70001    0.04389799 0.70001    0.70001   ]
 [0.70001    0.70001    0.70001    0.08525083 0.70001   ]]

After getting the cost matrix, feed it into the Hungarian algorithm

 row_indices, col_indices = linear_assignment(cost_matrix)

Of course, not all track and det can be matched. In iou matching, those larger than max_distacne are considered to be mismatched.

  matches, unmatched_tracks, unmatched_detections = [], [], []
    for col, detection_idx in enumerate(detection_indices):
        if col not in col_indices:
            unmatched_detections.append(detection_idx)
    for row, track_idx in enumerate(track_indices):
        if row not in row_indices:
            unmatched_tracks.append(track_idx)
    for row, col in zip(row_indices, col_indices):
        track_idx = track_indices[row]
        detection_idx = detection_indices[col]
        if cost_matrix[row, col] > max_distance:
            unmatched_tracks.append(track_idx)
            unmatched_detections.append(detection_idx)
        else:
            matches.append((track_idx, detection_idx))

Cascade match

See how to get the cost matrix. A track saves the features of multiple det, so there will be multiple cosine distances between the track and a feature of a det in the current frame, and the minimum value is taken as the final cosine distance between the track and the det, and then combined with the Mahalanobis matrix. deal with.

 def gated_metric(tracks, dets, track_indices, detection_indices):
            features = np.array([dets[i].feature for i in detection_indices])
            targets = np.array([tracks[i].track_id for i in track_indices])
            cost_matrix = self.metric.distance(features, targets) #计算代价矩阵
            cost_matrix = linear_assignment.gate_cost_matrix( #结合马氏矩阵进行处理
                self.kf, cost_matrix, tracks, dets, track_indices, #
                detection_indices)
            return cost_matrix

   def distance(self, features, targets):
        cost_matrix = np.zeros((len(targets), len(features)))
        for i, target in enumerate(targets):
            cost_matrix[i, :] = self._metric(self.samples[target], features)
        return cost_matrix

 def _nn_cosine_distance(x, y):
    distances = _cosine_distance(x, y)
    return distances.min(axis=0) #取最小值

First convert det to xyah format,

   measurements = np.asarray(
        [detections[i].to_xyah() for i in detection_indices])

Then calculate the Mahalanobis distance between the track prediction result and the detection result, and set the cost greater than gating_threshold( 9.4877 ) in the Mahalanobis distance to gated_cost(100000.0)

 for row, track_idx in enumerate(track_indices):
        track = tracks[track_idx]
        gating_distance = kf.gating_distance(
            track.mean, track.covariance, measurements, only_position)
        cost_matrix[row, gating_distance > gating_threshold] = gated_cost

Finally, set the cost matrix greater than max_distance to max_distance (set to 0.2 in cascade matching) + 1e-5.

In the fourth frame, the cost matrix obtained by the cosine distance is

 [[0.02467382 0.29672492 0.14992237 0.20593166 0.25746107]
 [0.27289903 0.01389802 0.2490201  0.26275396 0.18523771]
 [0.1549592  0.25630915 0.00923228 0.10906434 0.27596951]
 [0.26783013 0.19509423 0.26934785 0.24842238 0.01052856]]

Calculate the Mahalanobis distance, apply the Mahalanobis distance to the cosine distance, and set the cosine cost of the Mahalanobis greater than gating_threshold to gated_cost(100000.0).

Then the result obtained is

 [[2.46738195e-02 1.00000000e+05 1.00000000e+05 1.00000000e+05
  1.00000000e+05]
 [1.00000000e+05 1.38980150e-02 1.00000000e+05 1.00000000e+05
  1.00000000e+05]
 [1.00000000e+05 1.00000000e+05 9.23228264e-03 1.00000000e+05
  1.00000000e+05]
 [1.00000000e+05 1.00000000e+05 1.00000000e+05 1.00000000e+05
  1.05285645e-02]]

The setting of the cost matrix greater than max_distance is max_distance (set to 0.2 in cascade matching) + 1e-5, and the final cost matrix is:

 [[0.02467382 0.20001    0.20001    0.20001    0.20001   ]
 [0.20001    0.01389802 0.20001    0.20001    0.20001   ]
 [0.20001    0.20001    0.00923228 0.20001    0.20001   ]
 [0.20001    0.20001    0.20001    0.20001    0.01052856]]

The cost matrix is then input into the Hungarian algorithm to solve.

The steps of deepsrot are as follows

track is divided into uncomfirmed_track and confirmed_track
cascading matching of confirmed_track and det
- 1. Calculate the feature cosine distance cost matrix of track and detection results
- 2. Calculate the Mahalanobis distance and apply the Mahalanobis distance to the cost matrix. If the Mahalanobis distance is greater than gating_threshold, the corresponding cost in the cost matrix is set to gated_cost.
- 3. Set the const matrix greater than max_distance to max_distance
- 4. Hungarian solution, delete the result with larger matching value.
- According to the time_since_update of the track, loop 1-4, and merge the results.
The unconfirmed_track and the track that fails to match in the cascade matching and time_since_update=1 form the candidate track, and the candidate track and the unmatched det perform iou matching
- Calculate the iou cost matrix for the prediction results and detection results
- Hungarian Solver
Merge cascade matching and iou matching results.
Do the following for the final match to the track
- Kalman Update
- Store border features
- hits+1
- time_since_update is set to 0
- The track status is updated, and it is judged that the status can be set to confirmed
Do the following for the track that fails to match in the end
- Determine whether to keep or delete the track, and delete it if it fails to update after 30 frames.
Create a new track for the det that fails to match in the end

The whole process is as follows

ref:

Deep Sort algorithm code interpretation

[SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC](https:/

Source code interpretation of DeepSort

WeifaGan

引用和评论

思否发布丨2024 最受开发者欢迎的 AI 应用开发平台

喜大普奔，适用于 VS Code 的 GitHub Copilot 全新免费版本正式推出，GitHub 全球开发者突破1.5亿

从云计算一哥到全球生成式 AI 前行者：回顾 re:Invent 2024 三项重要发布

AI 驱动的个性化推荐系统设计

统计文本文件中单词频率的 Swift 与 Bash 实现详解

【AI日志分析】基于机器学习的异常检测：告别传统规则的智能进阶

真是惭愧，直到今天，我才搞懂桶排序算法