Abstract: PostGIS provides spatial database analysis capabilities for PostgreSQL. It is one of the mainstream geographic databases in the industry and provides the following spatial information service functions: spatial objects, spatial indexes, spatial operation functions, and spatial operators. In GaussDB, PostGIS geographic database expansion is currently supported, and it has been widely used in domestic and foreign public security, agriculture, security and other government and enterprise customers.

This article is shared from the HUAWEI CLOUD community " shocked, PostGIS can also be used like this! ! ! ", the original author: The Language of Autumn.

1. What can geodatabases do

The geographic database is a spatial database, which provides a standard format and storage method for geographic data, which can be easily and quickly retrieved, updated, and data analyzed, and ultimately serves the purpose of serving a variety of applications. Geographical data includes observation data, analytical measurement data, remote sensing data and statistical survey data. The geographic database has been widely used in bicycle, navigation, tourism, water conservancy, agriculture, Anping city and other application scenarios, and has penetrated every bit of people's lives.
image.png

Figure 1. Typical application scenario of geodatabase

Two, PostGIS function introduction

For the usage scenarios described above, geographic data is usually stored as a collection of points, lines, or polygons. In PostgreSQL, spatial data types such as points, lines, and polygons have been provided, but the data processing methods and performance provided by it are difficult to meet the requirements of GIS. The main manifestations are: lack of complex spatial types; no spatial analysis; no Projection transformation function. In order to enable PostgreSQL to provide better spatial information services, PostGIS came into being.

2.1 PostGIS supports data types

PostGIS fully complies with the OpenGIS specification and supports all spatial data types in OpenGIS:

  • a.POINT, LINESTRING, POLYGON, MULTI-POINT,
  • b.MULTI-LINESTRING, MULTI-POLYGON,
  • c.GEOMETRY COLLECTION

In addition to the geographic data types defined by OpenGIS, PostGIS has also extended the data types, extending the EWKT and EWKB data types based on the WKT and WKB data types:

  • a. EWKT, EWKB (WKT/WKB containing SRID information)
  • b. SRID (Spatial Referencing System Identifier): Each spatial instance has a spatial reference identifier (SRID). SRID corresponds to a spatial reference system based on a specific ellipsoid, and can be used for flat sphere mapping or round sphere mapping.

In addition, PostGIS also supports raster analysis of raster data, which can realize statistical analysis of different types of image or satellite data based on existing image or satellite data.

2.2 PostGIS supports function types

The common functions of PostGIS can be roughly divided into the following six categories. For the specific usage of each function, please refer to the "PostGIS User Manual":

1. Field processing functions

a. AddGeometryColumn adds a geographic geometry data field to the existing data table;
b. DropGeometryColumn deletes a geographic data field;
c. ST_SetSRID set SRID value

2. Geometric relation function

This type of function describes the distance, inclusion, range, phase and other geometric relations of geometric objects. Common functions are as follows: ST_Distance, ST_Equals, ST_Disjoint, ST_Intersects, ST_Touches, ST_Within, ST_Overlaps, ST_Contains.

3. Read and write functions

This kind of function is mainly used for the conversion between various data types, especially the conversion between the Geometry data type and other character data types, such as ST_AsText, ST_GeomFromText, ST_AsGeoJSON ST_AsHEXEWKB, ST_AsKML, ST_AsLatLonText.

4. Geometric object creation function

This type of function is used to create geometric objects such as points, lines, and multiple deformations, such as ST_GeomFromEWKT, ST_GeomFromEWKB, ST_MakePoint, ST_MakeBox2D, ST_LineFromText, ST_Polygon.

5. Geometric object editing functions

This type of function provides functions such as ST_AddPoint, ST_Reverse, ST_Rotate, ST_Scale, ST_Snap, ST_Transform, ST_Translate, ST_TransScale, and other functions for the translation, flip, rotation, and enlargement of geometric images.

6. Spatial relations and measurement functions

This type of function realizes the calculation of the farthest, nearest, length, area of geometric objects, such as ST_3DClosestPoint, ST_3DDistance, ST_3DDWithin, ST_3DDFullyWithin, ST_3DIntersects, ST_3DLongestLine, ST_3DMaxDistance, ST_3DShortestLine, ST_Area.

3. Installation of PostGIS:

3.1 Basic compilation environment preparation:

The PostGIS Extension in GaussDB (DWS) needs to be compiled and installed using GCC and G++ tools. Before installation, confirm that the GCC and G++ version numbers are greater than or equal to 4.8.5, and the two tools can be used normally. Specifically, you can check the relevant version through gcc -v and g++ -v.
image.png

As shown in the figure above, the gcc and g++ version of the environment is 4.3.4, which does not meet the version requirements, and you need to use the source code installation method to upgrade. If there is no lower version of gcc and g++ compiler in the cluster, you can install it by mounting the operating system image, which will not be repeated here. If you need to upgrade, you can obtain GCC related installation packages from the following website:

https://ftp.gnu.org/gnu/gcc/gcc-5.4.0/gcc-5.4.0.tar.gz

https://ftp.gnu.org/gnu/gmp/gmp-4.3.2.tar.gz

https://ftp.gnu.org/gnu/mpfr/mpfr-2.4.2.tar.gz

https://ftp.gnu.org/gnu/mpc/mpc-1.0.3.tar.gz

In addition, PostGIS installation also needs to ensure that tools such as zlib, autoconf, and automake have been installed correctly.

3.2 PostGIS dependency library installation:

PostGIS relies on third-party open source tools such as Geos, Proj, JSON-C, Libxml2, and Gdal. Before security, you first need to download the source code of Geos, Proj, JSON-C, Libxml2, Gdal, PostGIS to the $GAUSSHOME directory.

During installation, switch to the omm user, and check that the GaussDB environment variables have been loaded correctly, and you can log in and use the GaussDB environment normally. For specific installation commands, please refer to the product documentation, and ensure that the installation path is exactly the same as the path given in the documentation. A custom path will cause the final library file distribution to fail.

During the entire installation process, you can use the make -sj and make install -sj commands to speed up the compilation in parallel. The -sj command has a very low probability of installation errors. If the installation fails, please use make and make install for serial installation. In addition, for ARM physical machines, the following compilation parameters need to be added when configuring each installation package: --build=aarch64-unknown-linuxgnu, otherwise the installation will fail. At the same time, during the installation of the third-party open source library proj, an error may be reported that the $GAUSSHOME/install/proj/bin directory does not exist. You can manually create the directory and then perform the proj installation.

After the installation is successful, the screenshot is roughly as follows, in which the prompt message about the non-existence of get_PlatForm_str.sh can be directly ignored.
image.png

After the installation is complete, you need to use the PostGIS_install.sh tool to complete the distribution of PostGIS related dynamic link libraries in the cluster nodes. The execution method is:

sh $GAUSSHOME/share/postgis/PostGIS_install.sh

If it fails, you can open the PostGIS_install.sh file and execute the distribution commands separately to determine which file does not exist that caused the distribution to fail, and further analyze the cause of the library file compilation failure.

Four, PostGIS use:

Currently, GaussDB supports pushdown to DN processing for most functions in PostGIS. Therefore, for most geographic data operations, you can take full advantage of GaussDB's distributed computing advantages and bring performance acceleration compared to PostgreSQL's approximate linear scaling ratio.

4.1 Import of geographic data:

PostGIS supports the import of geographic data in shape format. Therefore, for other geographic database platform data, such as geographic data in Oracle or ArcGIS, you can first export it to a shape format file, and then use the shp2pgsql tool in the $GAUSSHOME/bin directory to convert the shape file to a sql file and further import it into GaussDB. If the geographic table definition in the generated sql file lacks a distribution key, you can manually add a distribution key with self-increment id to distribute the geographic data evenly to each DN, and then make full use of the distributed computing advantages of GaussDB.

4.2 GIS function use demo:

For the specific usage method of each GIS function supported by GaussDB, please refer to "PostGIS-2.4.2 User Manual". Here is a simple demo.

Example 1: Creation of geometry table.
CREATE TABLE cities ( id integer, city_name varchar(50) );
SELECT AddGeometryColumn('cities', 'position', 4326, 'POINT', 2);
Example 2: Insertion of geometric data.
INSERT INTO cities (id, position, city_name) VALUES (1,ST_GeomFromText('POINT(-9.5 23)',4326),'CityA');
INSERT INTO cities (id, position, city_name) VALUES (2,ST_GeomFromText('POINT(-10.6 40.3)',4326),'CityB');
INSERT INTO cities (id, position, city_name) VALUES (3,ST_GeomFromText('POINT(20.8 30.3)',4326), 'CityC');
Example 3: Calculate the distance between any two cities among three cities.
SELECT p1.city_name,p2.city_name,ST_Distance(p1.position,p2.position) FROM cities AS p1, cities AS p2 WHERE p1.id > p2.id;

The results of the implementation are:
image.png

Example 4: Query the query execution plan and find that the p1 table is broadcast to all DN nodes during execution. Therefore, for each DN, it only needs to process the p2 data on its own node by comparing the local p2 data with the global p1 data. Complete the entire analysis, which is also the advantage of distributed databases.
image.png

Five, PostGIS performance introduction

The spatial databases currently on the market include MySQL's Spatial Extension, PostgreSQL's PostGIS, Oracle Spatial, ArcGIS's ArcSDE and MongoDB. For the performance comparison of these several databases, there is a document "Commonly Used Geographic Database Comparison Test" which has a more detailed comparison and introduction.

It can be seen from the test data in Figure 2 to Figure 5 that for point data, under the same query conditions, the spatial query speed of the PostGIS database is the fastest. For line data, PostGIS is slower than other databases, which may be related to the different indexing techniques used by different geographic databases.
image.png

Figure 2. The results of the first data check
image.png

Figure 3. The results of the second data check
image.png

Figure 4. Data results of the first online check
image.png

Figure 5. The second line check data result

As a distributed database, GaussDB has made a deep adaptation to PostGIS. Currently, GaussDB supports pushdown to DN processing for most functions in PostGIS. Therefore, for most geographic data operations, you can take full advantage of GaussDB's distributed computing advantages to bring performance acceleration compared to PostgreSQL's approximate linear scaling ratio, and meet customers' geographic data processing and analysis needs in big data scenarios.

Six, summary

This blog post briefly introduces the installation and use of Postgis in GaussDB. Readers are welcome to collect and discuss.

If you want to know more about GuassDB (DWS), welcome to search "GaussDB DWS" on WeChat and follow the WeChat public account, and share the latest and most complete PB-level digital warehouse black technology with you. You can also get a lot of learning materials in the background~

Click to follow, and get to know the fresh technology of


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量