数据库 - Paper Time｜Open spatiotemporal big data helps intelligent bus route planning - OceanBase技术站

The acceleration of urbanization has brought about the explosive growth of urban residents' travel demand and the scale of urban public transportation. How to better develop and manage urban public transport and achieve the optimization of social and economic benefits has always been a concern. In recent years, big data technology has become more and more mature, and its application in the field of transportation has also been deepened. For example, the use of "spatiotemporal big data" technology to help public transportation to optimize routes is a successful application case.

The third issue of OceanBase Paper Time invited Wang Sheng, associate professor of the School of Computer Science of Wuhan University, to share the title of "Open spatiotemporal big data helps intelligent bus route planning". Professor Wang Sheng started from four aspects, including the concept, research direction, method and practical application of research results of open spatiotemporal data, focusing on urban open spatiotemporal big data, and introduced his public transportation capacity estimation and routes based on open spatiotemporal data. Planning research, sharing the research process of how to achieve the two goals of maximum capacity and convenient transfer, and verified on the New York bus network. The original paper "Public Transport Planning: When Transit Network Connectivity Meets Commuting Demand" shared in this issue was published at 2021 SIGMOD (International Conference on Data Management).

Today, I bring you a review of the text version of the sharing. You are welcome to learn, discuss and share together.

This article is organized from the live broadcast content and related papers shared by the guests:

Thank you OceanBase for the invitation, and I am honored to take this opportunity to share with you the research content of the past few years. In 2021, I completed my postdoctoral fellowship at New York University and returned to China to join the School of Computer Science of Wuhan University as an associate professor. The content I share today is "Open Spatiotemporal Big Data Assists Intelligent Bus Route Planning", including four aspects: research background, research questions, research methods, and experimental evaluation.

What is the use of spatiotemporal data research?

Data has become the new factor of production

In 2021, the State Council issued the "14th Five-Year Plan for Digital Economy Development", which pointed out that the multiplier effect of data on improving production efficiency has become increasingly prominent, and it has become the most characteristic production factor of the times. In the plan, "smart city" is also mentioned five times, for example, "in combination with the construction of new smart cities, accelerate the integration of urban data and the cultivation of industrial ecology, and improve the level of urban data operation and development and utilization."

For example, at present, many cities at home and abroad have built open government data platforms, such as Wuhan’s public data open platform ( https://data.wuhan.gov.cn/ ), New York City’s NYC Open Data ( https://www.nyc.gov.cn/ ). ://opendata.cityofnewyork.us/ ), etc., building these platforms can also further play the role of data production factors.

According to statistics, at least 60% of open datasets contain geographic information, including road network data, bus network data, trajectory data (vehicles, pedestrians), etc., such as the spatiotemporal data platform OpenStreetMap ( https://www.openstreetmap.org/ ) Just provide map datasets all over the world, the following picture is the Beijing map data downloaded from the OpenStreetMap platform.

The table below is from our paper "A Survey on Trajectory Data Management, Analytics, and Learning" published in ACM Computing Surveys. The table shows several existing urban trajectory datasets, which can be roughly divided into three groups: People, vehicles (cars, trucks, trains, buses, trams, etc.), others (animals and hurricanes), open downloads of numerous trajectory datasets facilitate research. We can observe that human-derived trajectory data (including vehicles) is a common source of trajectory data and is currently the largest source of trajectory data. For example, New York City recorded 1.1 billion personal taxi trips from January 2009 to June 2015.

The importance of spatiotemporal data research

Spatiotemporal data research has already had relatively mature applications in the fields of epidemic prevention and control, intelligent public transportation and other livelihood fields, and is increasingly playing a key role.

In the field of epidemic prevention and control: trajectory similarity can be further used to accurately calculate the time-space accompanying time of cases, determine infection risks, and quickly query close contacts; we have also seen some epidemic prevention mobile applications that have been developed to automatically record daily trajectories, Intelligently map to public indoor places and bus schedules, and can directly calculate the density on the mobile phone without revealing user privacy.

In the field of smart public transport: Evaluate public transport network connectivity and plan new bus routes to facilitate transfers, validated on New York and Chicago datasets. At the same time, real-time bus can collect massive real-time bus location data, evaluate bus punctuality, predict arrival time, and intelligently plan itinerary.

Difficulties in Trajectory Data Management

Large volume of data: For example, the number of motor vehicles in Wuhan has exceeded 4 million, and it is more difficult to store and manage massive data;
Large differences: For example, the trajectory characteristics of vehicles, pedestrians, and electric vehicles are different, which increases the difficulty of data analysis and modeling;
Data fusion: need to be integrated with road network, lane, public places and other data;
There are many types of queries: involving time-space accompaniment, intersection monitoring and other scenarios;
Difficulty matching: The similarity matching model has high complexity.

An overview of the author's research in the field of spatiotemporal data management

My research in the field of spatiotemporal data management mainly includes three directions:

1. Promote basic theoretical research such as large-scale trajectory data management. Put forward innovative achievements in basic theoretical research such as trajectory data preprocessing, storage, compression, similarity measurement, and integrated indexing;

2. The research is driven by real applications and proposes key technical solutions to serve real scenarios. Such as promoting the application of trajectory data in public transportation planning, real-time traffic monitoring, and smart tourism route planning;

3. Advocate equal emphasis on basic research and prototype system development, and build and open source the first vehicle trajectory search engine. It can efficiently support a variety of queries and novel trajectory measurement methods, and the proposed algorithms are all applied to the online interactive system for display.

Exploration and analysis of spatiotemporal data research problems

Next, I will introduce the specific application of spatiotemporal data research in real life, taking the re-planning of New York City bus routes as an example.

Public transportation is an effective means of solving urban traffic congestion, with 56% of New York's population using public transportation for travel. On the one hand, trajectory data reflects new travel needs, such as why people choose to take taxis instead of buses when commuting, and whether there is an imbalance between supply and demand; on the other hand, traditional methods such as censuses and questionnaires cannot accurately and timely It is particularly important to obtain public demand and use trajectory data to plan new bus routes.

We know that cities will experience rise and fall, and similarly, an area within a city will also experience rise and fall, and this process is usually accompanied by large-scale population migration. At this time, the bus system designed for the past will not be able to meet the latest demand, resulting in wasted or overloaded capacity. And through spatiotemporal data analysis, it can help cities plan bus routes more rationally.

Research Question 1: Capacity Estimation of Bus Routes

For bus route planning, the primary consideration is undoubtedly the issue of capacity. How many people will take a bus line to build a bus line, and can it recover the cost or ensure profitability? How to maximize capacity and have as many passengers as possible?

As shown in the figure below, when planning bus routes, we need to consider Ridership (passenger flow) and Coverage (coverage). If only Ridership is considered, the bus line will run on densely populated roads, and more frequent trains will be arranged. At this time, due to the concentration of lines, most passengers need to walk farther to reach the bus station; if only Coverage is considered, then the bus route It will run on more streets in a larger area, and passengers only need to walk a short distance to reach the bus stop, but at this time, because the dispersion of lines will reduce the number of trains, people will have to wait longer for the bus.

Therefore, when planning a new bus route, we need to estimate its capacity according to the travel trajectory, and calculate how many passengers will choose it as the nearest route to travel, so as to maximize the capacity.

Research question 2: How to make the new route easy to transfer

Another key issue in bus route planning is the ease of transfers. When people travel to their destination, they can rarely reach a single line, and in most cases, they need to transfer, including changing buses, subways, ferries, etc.

As shown in the image below, Connections and One-Seat Rides are two situations where people travel from their departure to their destination. With Connections, bus lines are more concentrated and straight, so passengers need an extra trip to and from the station and origin/destination, but the wait time will be shorter and the entire trip will take less time; with One-Seat Rides , the bus route is more circuitous, passengers only need to take one ride to get from the starting point to the destination, but the waiting time will be longer, and the overall journey will take more time.

Therefore, the convenience of transfer should also be fully considered when planning bus lines. The main difficulty of this problem is the lack of a formal method to define the transfer convenience index; the second is how to achieve transfer while maximizing transport capacity, that is, multi-objective route optimization. .

Using spatiotemporal data to study how to plan bus routes

We propose two research methods, "reverse k-nearest neighbor query" and "bus route planning algorithm" , for capacity estimation and re-planning of bus routes.

Reverse k-nearest neighbor query

1. Problem Definition

When designing this research method, we mainly used two kinds of data: NYC Taxi Data and existing bus network data (as shown in the figure below). The specific problem of the research is: given a new bus Route Q, query how many trajectories (taxi) are used as nearest neighbors. The basic solution is to find the k nearest lines from each trajectory and see if Q is in its result list.

This problem is a research I conducted during my Ph.D. The research result "Reverse k Nearest Neighbor Search over Trajectories" was published in the journal "IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING" in April 2018.

2. Main technical challenges and solutions

There are two main technical challenges to study this problem: one is the high complexity; the other is the huge amount of data, for example, New York has 8 million people and more than 2,000 bus routes.

Therefore, we propose to use database indexing technology to solve the technical challenge: pre-compute in advance to put similar trajectories together and implement batch processing, thereby avoiding one-to-one matching calculation. At the same time, it is further extended to use this query as a tool to plan bus routes. After the start and end points are given, prepare some candidates (alternative routes) to calculate the transport capacity of the bus routes, and finally use the graph pruning technology to quickly find a route that can maximize the transport capacity. route.

As shown below, we show 4 routes: Original (original route), Shortest (shortest route), MaxRkNNT (route that attracts the most passengers), MinRkNNT (route that attracts the fewest passengers), we found that the original route and the MaxRkNNT route are almost the same , the difference is that the MaxRkNNT travels 10 meters more, but can attract an additional 129 passengers.

In the process of calculating public transport capacity, we encountered many challenging problems. We reduced the difficulty of challenges through database query skills and solved these problems quickly.

Capacity & Transfer Optimization Route Planning Algorithm

After I arrived in New York, I got new inspiration from the optimization of New York's bus routes, and further improved the capacity & transfer optimization route planning algorithm: In addition to the original trajectory data and bus network data, the data source added a new road network. data.

As shown in the figure below, the gray circles represent the road network nodes, and the numbers represent the demand of the edge (the number of people who need to take the bus); the black squares represent the bus stops; the blue lines represent the existing bus routes; the red lines represent the trajectory data of the pedestrians , which are mapped to the network through map matching. We can find from this figure that the pedestrians go from the v5 road network node to the No. 7 bus stop, and there is a section of road in the middle that needs to walk because the bus is not open. Trajectories represent commuting needs and can be potential new bus routes if there are no existing bus routes on their path.

Therefore, we need to plan a new bus route with at most k-stop routes that maximizes the target values (capacity and transfer convenience). That is, the newly opened route needs to ensure the convenience of transfer under the premise of satisfying the overall connectivity of the bus network. At this time, we will also pay attention to another attribute of public transportation, that is, fairness: if only the capacity is pursued, the bus routes will be concentrated in densely populated areas, and the transfer of suburban routes will be very inconvenient. As a data science researcher, I think it is necessary to focus more on the interests of ordinary people.

1. Problem definition and indicator construction

How do we define the connectivity of a transit network? First, we can convert the above image into a connection matrix and get its eigenvectors to judge the tightness of the line connection. In addition, traffic demand, that is, the degree of overlap between bus routes and commuter trajectories, is also considered.

Bus network connectivity calculation:

Calculation of commuting needs:

2. CT-Bus problem

We name the optimal bus route optimization problem CT-Bus: given commuter trajectory data and bus network data, plan a new route with at most k edges that maximizes the target value.

Linear Combination Optimization Objective:

We use a linear combination to trade off the above two objectives, where a configurable parameter w is a value that satisfies various planning requirements.

3. Measurement method of public transport network connectivity

If you have studied graph theory, you will know the concept of "edge connectivity". In layman's terms, it is the minimum number of edges to be removed from a graph, which can become a non-connected graph or a trivial graph. That applies to bus routes. For example, some roadside connectivity to the suburbs is 1. If it is disconnected, the entire network will be disconnected.

At this time, we noticed that in the field of protein research in biochemistry, there is a concept called Natural connectivity (natural connectivity), so we introduced it into the study of public transport network problems, using the eigenvalues of the public transport network adjacency matrix to calculate [ The connectivity index between 0,1]. It is arguably the most suitable property for transportation networks as it does not change drastically due to changes in algebraic connectivity or edge connectivity.

To verify the monotonicity of this idea in real transportation networks, we randomly and gradually remove existing routes from the Chicago and New York City transportation networks and observe an almost linear decrease in natural connectivity, as shown in the figure below.

Summary of method ideas

Reviewing our research content, first we proved that CT-Bus problem is NP-hard (NP: Non-deterministic Polynomial, non-deterministic polynomial) problem. Because it is a combination of two complex constrained optimization problems (meeting passenger commuting demand, maximizing transit network connectivity), there is no approximation ratio.

A more straightforward solution is to generate a large number of alternative routes, calculate the connectivity of each alternative route, and select the route with the highest target value. Since the objective function of CT-Bus is computationally expensive (the computation of connectivity requires computation of the eigenvalues of the adjacency matrix), we propose a general algorithm by expanding, sorting and pruning candidate paths in the network.

In the process of improving the algorithm, we adopted a preprocessing acceleration technique: first select some seed paths, continuously add edges on both sides of it, continuously estimate the demand, and at the same time estimate the connectivity of our newly added edges or routes. Through our continuous improvement and optimization of the algorithm, we can reduce the calculation speed from days to tens of seconds while ensuring the accuracy of the results.

Extended traversal algorithm

Meanwhile, when solving the CT-Bus problem, we adopt an extension-based graph traversal method.

1. Initialization stage

Select expansion seeds, in descending order of demand (expanding from edges with high demand); shortest path search algorithm connects two stops; accumulates demand for each network edge traversed by the path.

The second stage of expansion

Add adjacent edges as new candidates, and adopt breadth-first search; the search strategy can select all or best neighbors, which can effectively alleviate the convergence problem; calculate the target value, update the best path, and estimate the upper limit target value.

3. Inspection stage

Calculate the connectivity of the candidates and check whether the target value can be improved; if so, perform several checks to limit the turning problem of the path and avoid some repeated expansions; update the results after passing the checks.

Spatiotemporal data research planning bus route effect display

In order to facilitate a more intuitive understanding, we also use visualization technology to display the final planning effect.

Data set selection

In terms of datasets, we mainly selected three types of data from New York City and Chicago: GTFS bus data, road network data, and taxi trajectory data.

Display and analysis of new planning routes

As shown in the image below, the new planned route is indicated by a thick dark red line. Through the analysis of the effectiveness of the algorithm, the proposed method can generate an effective route with a high target value and maintain a balance between demand and connectivity; the shape of the planned route is also smoother on the map, which is a reference for the actual public transportation planning of the city value.

For New York City, more routes need to be built between Queens and Brooklyn, which will further connect more routes to Staten Island; Manhattan's existing subway and bus systems are very mature, and the improvement in connectivity is not obvious, not Newly planned bus routes are needed, and this is consistent with the fact that New York City is redesigning bus routes in four other boroughs except Manhattan; however, it is recommended to plan more routes connecting Manhattan and Staten Island, which is highly dependent on bus system, while the island has only one internal subway line; the Bronx is also building new routes that need to connect north and south, forming a circle linking Yankee Stadium, Hunts Point Avenue and Kingsbridge to significantly reduce commuter transfers.

The main performance indicators of the algorithm

We compare the results before and after the algorithm optimization. In terms of time efficiency, the ETA in the figure below is our proposed database algorithm, and the estimation technology based on pre-computing (ETA-Pre) can find the optimal route very efficiently.

New York Live Transit Platform

The real-time bus visualization interactive platform ( http://shengwang.site/bus/ ) recently developed by our research group also comes from real open datasets. In the future, we will also integrate data sets from more cities, especially to support domestic cities such as Wuhan.

write at the end

I think the use of public spatiotemporal data sets can effectively help the development of intelligent public transportation. The intelligent route planning driven by spatiotemporal data is usually very complicated, but database technology can greatly reduce the planning overhead. As more spatiotemporal data becomes public, it will benefit more people's livelihoods. Fields, such as; real-time bus trajectory data management and analysis, case trajectory tracking, bus schedule optimization and arrival time prediction, etc.

The above are all the recordings shared by Mr. Wang Sheng in the last live broadcast. I hope everyone can gain something!

Paper Time｜Open spatiotemporal big data helps intelligent bus route planning