Take you to read the AI ​​paper丨 LaneNet&#39;s end-to-end lane line detection based on entity segmentation

Abstract: LaneNet is an end-to-end lane line detection method, including LanNet + H-Net two network models.

This article is shared from Huawei Cloud Community " [Paper Interpretation] LaneNet End-to-End Lane Line Detection Based on Entity Segmentation", author: a small tree x.

Preface

This is an end-to-end lane line detection method, including LanNet+H-Net two network models.

LanNet is a multi-task model that disassembles instance segmentation tasks into "semantic segmentation" and "vector representation of pixels", and then clusters the results of the two branches to obtain the result of instance segmentation.

H-Net is a small network, responsible for predicting the transformation matrix H, using the transformation matrix H to remodel all pixels belonging to the same lane line. That is, learn the perspective transformation parameters of a given input image, and the perspective transformation can fit the lane lines on the sloped road well.

The overall network structure is as follows:

Paper address: TowardsEnd-to-EndLaneDetection:anInstanceSegmentationApproach

Open source dataset TuSimple: https://github.com/TuSimple/tusimple-benchmark/issues/3

Open source code: https://github.com/MaybeShewill-CV/lanenet-lane-detection

One, LanNet

LanNet performs instance segmentation on the input image, where the network structure is divided into two directions, one is semantic segmentation, the other is vector representation of pixels, and finally the results of the two branches are clustered to obtain the result of instance segmentation. LaneNet outputs the result of instance segmentation, and assigns a lane line ID to each lane line pixel.

1.1 network structure

First look at the network structure:

branch 1: Semantic segmentation, Segmentation, two-classification of pixels, to determine whether the pixel belongs to the lane line or the background;

branch 2: vector pixels, Embedding, embedding representation of pixels, the image features are expressed as embedding space, and the relationship between the features is mapped in the embedding space.
**
Clustering is implemented based on the Mean-Shift algorithm. The results of the two branches are clustered to obtain the result of instance segmentation.

LaneNet is an encoder-decoder model based on ENet. As shown in the following figure, ENet is composed of 5 stages, among which stage2 and stage3 are basically the same, stage1,2,3 belong to the encoder, and stage4,5 belong to the decoder.

1.2 Semantic segmentation

This part is to classify the pixels twice to determine whether the pixels belong to the lane line or the background; and they are highly imbalanced, so ENet is referred to, and the loss function uses the standard cross-entropy loss function.

When designing the semantic segmentation model, in order to deal with the occlusion problem, the paper restores (estimates) the lane lines and dashed lines occluded by vehicles;

Loss uses softmax_cross_entropy , in order to solve the problem of uneven sample distribution, boundedinverseclassweight used to weight loss:

Among them, p is the probability that the corresponding category appears in the overall sample, and c is the hyperparameter.

Loss design reference: paper ENet: ADeepNeuralNetworkArchitectureforReal-TimeSemanticSegmentation

1.3 Pixel mapping to embedding space

After the lanes are obtained by segmentation and recognition, in order to know which pixels belong to this lane and which belong to that lane, a lane instanceembedding branch network needs to be trained. It can output the pixel point distance of a lane line. The pixels belonging to the same lane are close, and vice versa. Based on this strategy, each lane line can be clustered.

In order to distinguish which lane the pixels on the lane line belong to, an embedding vector is initialized for each pixel, and when designing loss, the distance of the representation vector belonging to the same lane line is as small as possible, and the representation vector distance of different lane lines is as small as possible. It might be big.

The loss function of this part is composed of three parts: variance loss, distance loss, and regression loss:

Among them, C is the number of lane lines, Nc is the number of pixels belonging to the same lane line, μc is the mean vector of the lane line, and xi is the pixel embedding.

The loss function is derived from the paper "SemanticInstanceSegmentationwithaDiscriminativelossfunction"

variance loss(Lvar): When the distance between the pixel embedding xi and the corresponding lane line mean vector μc is greater than δv, the model will be updated so that xi is close to μc;

distance loss(Ldist): When the distance between the mean vector μca and μcb of different lane lines is less than δd, the model will be updated so that μca and μcb are far away from each other;

The variance loss(Lvar) makes the pixel vector approach the mean vector μc of the lane line, and the distance loss(Ldist) pushes the cluster centers away from each other.

1.4 Clustering

Embedding (pixel mapping to embedding space) already provides good feature vectors for clustering. Using these feature vectors, we can use any clustering algorithm to complete the goal of instance segmentation.

Clustering is implemented based on the Mean-Shift algorithm, which clusters the results of the two branches to obtain the result of instance segmentation.

First, meanshift clustering is used to make the cluster center move along the direction of increasing density to prevent outliers from being selected into the same cluster; then the pixel vector is divided until all lane line pixels are allocated to the corresponding lanes.

Two, H-Net

The output of LaneNet is a collection of pixels of each lane line, and a lane line needs to be regressed based on these pixels. The traditional method is to project the picture into a top view (a bird's-eye view), and then use a 2nd or 3rd order polynomial for fitting. In this method, the transformation matrix H is calculated only once, and all the pictures use the same transformation matrix, which will lead to errors under the change of the ground plane (mountain, hill).

In order to solve this problem, the paper trained a neural network H-Net that can predict the transformation matrix H. The input of the network is a picture and the output is the transformation matrix H:

The transposed matrix is constrained by setting 0, that is, the horizontal line remains horizontal under the transformation. (That is, the transformation of coordinate y is not affected by coordinate x)

It can be seen from the above formula that the transposed matrix H has only 6 parameters, so the output of H-Net is a 6-dimensional vector. H-Net is composed of 6 layers of ordinary convolutional network and one layer of fully connected network, and its network structure is shown in the figure:

3. Model effect

Lane line detection effect, compared with other models

The accuracy of the model is as high as 96.4%, which is pretty good.

Model speed:

2018: 512X256 image speed measured on NVIDIA1080TI. In general, lane detection can run at 52FPS. detection speed of 1613196c1dcbdb is faster and the real-time performance is higher.

2020: Add the real-time segmentation model BiseNetV2 as the backbone of Lanenet. The new model can reach 78fps in the inference process of a single image. The new Lanenet model trained on BiseNetV2 can be found here.

Model effect:

Four, open source code

Open source code: https://github.com/MaybeShewill-CV/lanenet-lane-detection

The open source code uses the LaneNet deep neural network model for real-time lane detection (unofficial version)

The model consists of encoder-decoder stage, binary semantic segmentation stage and instance semantic segmentation using discriminant loss function for real-time lane detection tasks.

Operating environment of the code: (The following is a pro-test)

system: ubuntu16.04(x64)

language:

depth frame: TensorFlow 1.15.0 (GPU version)

other dependent libraries: cv2, matplotlib, scikit_learn, numpy, etc.

Practice process:

1) Create a conda environment

condacreate-nLineNetpython=3.6
2) Enter the environment just created

condaactivateLineNet
3) Install related dependent libraries according to requirements.txt (here I also accelerated the installation of Alibaba Cloud)

pip3install-rrequirements.txt-ihttps://mirrors.aliyun.com/pypi/simple/

I looked at this file and found something wrong.

Tensoflow repeated: tensorflow_gpu==1.15.0, tensorflow==1.15.0, delete one according to the usage, I used GPU acceleration accurately, so I deleted tensorflow==1.15.0.

4) Install NVIDIA's cudatoolkit10.0 version

condainstallcudatoolkit=10.0
5) Install version 7.6 of the NVIDIA deep learning software package

condainstallcudnn=7.6.5
6) Set lanenet_model environment variable

exportPYTHONPATH="${PYTHONPATH}:your_path/lanenet-lane-detection/lanenet_model"

your_path is the absolute path where the lanenet-lane-detection folder is located.

7) Download the model

Link: https://pan.baidu.com/s/1-av2fK7BQ05HXjKMzraBSA Extraction code: 1024

A total of 4 files, about 30M.

Then, under the lanenet-lane-detection directory, create a new subdirectory named model_weights. There are 4 model files, which will be used later.

8) Test model

pythontools/test_lanenet.py--weights_pathmodel_weights/tusimple_lanenet.ckpt--image_path./data/tusimple_test_image/3.jpg

Successfully use GPU acceleration:

Semantic segmentation and pixel embedding effects:

Entity segmentation effect:

Model effect:

refer to:
https://www.jianshu.com/p/c6d38d648509

https://www.cnblogs.com/xuanyuyt/p/11523192.html

LaneNet:TowardsEnd-to-EndLaneDetection:anInstanceSegmentationApproach

ENet:ADeepNeuralNetworkArchitectureforReal-TimeSemanticSegmentation

DiscriminativeLoss:SemanticInstanceSegmentationwithaDiscriminativelossfunction

Paper address: TowardsEnd-to-EndLaneDetection:anInstanceSegmentationApproach

Open source code: https://github.com/MaybeShewill-CV/lanenet-lane-detection

This article only provides reference study, thank you.

Click to follow, and learn about Huawei Cloud's fresh technology for the first time~

Take you to read the AI paper丨 LaneNet's end-to-end lane line detection based on entity segmentation

Preface

One, LanNet

1.1 network structure

1.2 Semantic segmentation

1.3 Pixel mapping to embedding space

1.4 Clustering

Two, H-Net

3. Model effect

Four, open source code

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

一文掌握 MCP 上下文协议：从理论到实践

AdventureX 2025 正式启动：五天四夜，120小时极限创造！一起在杭州点燃青年创新之火！

大模型时代，后端程序员如何避免被AI卷死？

MCP 协议为何不如你想象的安全？从技术专家视角解读

🔥吐血整理 Bolt.diy 部署与应用攻略

常见的 AI 模型格式

Take you to read the AI ​​paper丨 LaneNet&#39;s end-to-end lane line detection based on entity segmentation

Preface

One, LanNet

1.1 network structure

1.2 Semantic segmentation

1.3 Pixel mapping to embedding space

1.4 Clustering

Two, H-Net

3. Model effect

Four, open source code

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

一文掌握 MCP 上下文协议：从理论到实践

AdventureX 2025 正式启动：五天四夜，120小时极限创造！一起在杭州点燃青年创新之火！

大模型时代，后端程序员如何避免被AI卷死？

MCP 协议为何不如你想象的安全？从技术专家视角解读

🔥吐血整理 Bolt.diy 部署与应用攻略

常见的 AI 模型格式

Take you to read the AI paper丨 LaneNet's end-to-end lane line detection based on entity segmentation