Abstract: LaneNet is an end-to-end lane line detection method, including LanNet + H-Net two network models.
This article is shared from Huawei Cloud Community " [Paper Interpretation] LaneNet End-to-End Lane Line Detection Based on Entity Segmentation", author: a small tree x.
Preface
This is an end-to-end lane line detection method, including LanNet+H-Net two network models.
LanNet is a multi-task model that disassembles instance segmentation tasks into "semantic segmentation" and "vector representation of pixels", and then clusters the results of the two branches to obtain the result of instance segmentation.
H-Net is a small network, responsible for predicting the transformation matrix H, using the transformation matrix H to remodel all pixels belonging to the same lane line. That is, learn the perspective transformation parameters of a given input image, and the perspective transformation can fit the lane lines on the sloped road well.
The overall network structure is as follows:
Paper address: TowardsEnd-to-EndLaneDetection:anInstanceSegmentationApproach
Open source dataset TuSimple: https://github.com/TuSimple/tusimple-benchmark/issues/3
Open source code: https://github.com/MaybeShewill-CV/lanenet-lane-detection
One, LanNet
LanNet performs instance segmentation on the input image, where the network structure is divided into two directions, one is semantic segmentation, the other is vector representation of pixels, and finally the results of the two branches are clustered to obtain the result of instance segmentation. LaneNet outputs the result of instance segmentation, and assigns a lane line ID to each lane line pixel.
1.1 network structure
First look at the network structure:
branch 1: Semantic segmentation, Segmentation, two-classification of pixels, to determine whether the pixel belongs to the lane line or the background;
branch 2: vector pixels, Embedding, embedding representation of pixels, the image features are expressed as embedding space, and the relationship between the features is mapped in the embedding space.
**
Clustering is implemented based on the Mean-Shift algorithm. The results of the two branches are clustered to obtain the result of instance segmentation.
LaneNet is an encoder-decoder model based on ENet. As shown in the following figure, ENet is composed of 5 stages, among which stage2 and stage3 are basically the same, stage1,2,3 belong to the encoder, and stage4,5 belong to the decoder.
1.2 Semantic segmentation
This part is to classify the pixels twice to determine whether the pixels belong to the lane line or the background; and they are highly imbalanced, so ENet is referred to, and the loss function uses the standard cross-entropy loss function.
When designing the semantic segmentation model, in order to deal with the occlusion problem, the paper restores (estimates) the lane lines and dashed lines occluded by vehicles;
Loss uses softmax_cross_entropy , in order to solve the problem of uneven sample distribution, boundedinverseclassweight used to weight loss:
Among them, p is the probability that the corresponding category appears in the overall sample, and c is the hyperparameter.
Loss design reference: paper ENet: ADeepNeuralNetworkArchitectureforReal-TimeSemanticSegmentation
1.3 Pixel mapping to embedding space
After the lanes are obtained by segmentation and recognition, in order to know which pixels belong to this lane and which belong to that lane, a lane instanceembedding branch network needs to be trained. It can output the pixel point distance of a lane line. The pixels belonging to the same lane are close, and vice versa. Based on this strategy, each lane line can be clustered.
In order to distinguish which lane the pixels on the lane line belong to, an embedding vector is initialized for each pixel, and when designing loss, the distance of the representation vector belonging to the same lane line is as small as possible, and the representation vector distance of different lane lines is as small as possible. It might be big.
The loss function of this part is composed of three parts: variance loss, distance loss, and regression loss:
Among them, C is the number of lane lines, Nc is the number of pixels belonging to the same lane line, μc is the mean vector of the lane line, and xi is the pixel embedding.
The loss function is derived from the paper "SemanticInstanceSegmentationwithaDiscriminativelossfunction"
variance loss(Lvar): When the distance between the pixel embedding xi and the corresponding lane line mean vector μc is greater than δv, the model will be updated so that xi is close to μc;
distance loss(Ldist): When the distance between the mean vector μca and μcb of different lane lines is less than δd, the model will be updated so that μca and μcb are far away from each other;
The variance loss(Lvar) makes the pixel vector approach the mean vector μc of the lane line, and the distance loss(Ldist) pushes the cluster centers away from each other.
1.4 Clustering
Embedding (pixel mapping to embedding space) already provides good feature vectors for clustering. Using these feature vectors, we can use any clustering algorithm to complete the goal of instance segmentation.
Clustering is implemented based on the Mean-Shift algorithm, which clusters the results of the two branches to obtain the result of instance segmentation.
First, meanshift clustering is used to make the cluster center move along the direction of increasing density to prevent outliers from being selected into the same cluster; then the pixel vector is divided until all lane line pixels are allocated to the corresponding lanes.
Two, H-Net
The output of LaneNet is a collection of pixels of each lane line, and a lane line needs to be regressed based on these pixels. The traditional method is to project the picture into a top view (a bird's-eye view), and then use a 2nd or 3rd order polynomial for fitting. In this method, the transformation matrix H is calculated only once, and all the pictures use the same transformation matrix, which will lead to errors under the change of the ground plane (mountain, hill).
In order to solve this problem, the paper trained a neural network H-Net that can predict the transformation matrix H. The input of the network is a picture and the output is the transformation matrix H:
The transposed matrix is constrained by setting 0, that is, the horizontal line remains horizontal under the transformation. (That is, the transformation of coordinate y is not affected by coordinate x)
It can be seen from the above formula that the transposed matrix H has only 6 parameters, so the output of H-Net is a 6-dimensional vector. H-Net is composed of 6 layers of ordinary convolutional network and one layer of fully connected network, and its network structure is shown in the figure:
3. Model effect
Lane line detection effect, compared with other models
The accuracy of the model is as high as 96.4%, which is pretty good.
Model speed:
2018: 512X256 image speed measured on NVIDIA1080TI. In general, lane detection can run at 52FPS. detection speed of 1613196c1dcbdb is faster and the real-time performance is higher.
2020: Add the real-time segmentation model BiseNetV2 as the backbone of Lanenet. The new model can reach 78fps in the inference process of a single image. The new Lanenet model trained on BiseNetV2 can be found here.
Model effect:
Four, open source code
Open source code: https://github.com/MaybeShewill-CV/lanenet-lane-detection
The open source code uses the LaneNet deep neural network model for real-time lane detection (unofficial version)
The model consists of encoder-decoder stage, binary semantic segmentation stage and instance semantic segmentation using discriminant loss function for real-time lane detection tasks.
Operating environment of the code: (The following is a pro-test)
system: ubuntu16.04(x64)
language:
depth frame: TensorFlow 1.15.0 (GPU version)
other dependent libraries: cv2, matplotlib, scikit_learn, numpy, etc.
Practice process:
1) Create a conda environment
condacreate-nLineNetpython=3.6
2) Enter the environment just created
condaactivateLineNet
3) Install related dependent libraries according to requirements.txt (here I also accelerated the installation of Alibaba Cloud)
pip3install-rrequirements.txt-ihttps://mirrors.aliyun.com/pypi/simple/
I looked at this file and found something wrong.
Tensoflow repeated: tensorflow_gpu==1.15.0, tensorflow==1.15.0, delete one according to the usage, I used GPU acceleration accurately, so I deleted tensorflow==1.15.0.
4) Install NVIDIA's cudatoolkit10.0 version
condainstallcudatoolkit=10.0
5) Install version 7.6 of the NVIDIA deep learning software package
condainstallcudnn=7.6.5
6) Set lanenet_model environment variable
exportPYTHONPATH="${PYTHONPATH}:your_path/lanenet-lane-detection/lanenet_model"
your_path is the absolute path where the lanenet-lane-detection folder is located.
7) Download the model
Link: https://pan.baidu.com/s/1-av2fK7BQ05HXjKMzraBSA Extraction code: 1024
A total of 4 files, about 30M.
Then, under the lanenet-lane-detection directory, create a new subdirectory named model_weights. There are 4 model files, which will be used later.
8) Test model
pythontools/test_lanenet.py--weights_pathmodel_weights/tusimple_lanenet.ckpt--image_path./data/tusimple_test_image/3.jpg
Successfully use GPU acceleration:
Semantic segmentation and pixel embedding effects:
Entity segmentation effect:
Model effect:
refer to:
https://www.jianshu.com/p/c6d38d648509
https://www.cnblogs.com/xuanyuyt/p/11523192.html
LaneNet:TowardsEnd-to-EndLaneDetection:anInstanceSegmentationApproach
ENet:ADeepNeuralNetworkArchitectureforReal-TimeSemanticSegmentation
DiscriminativeLoss:SemanticInstanceSegmentationwithaDiscriminativelossfunction
Paper address: TowardsEnd-to-EndLaneDetection:anInstanceSegmentationApproach
Open source code: https://github.com/MaybeShewill-CV/lanenet-lane-detection
This article only provides reference study, thank you.
Click to follow, and learn about Huawei Cloud's fresh technology for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。