数据库 - Towards a Cloud-Native Database - Code Practices for Accelerating SQL Server Migration with Babelfish - 亚马逊云开发者

1 Introduction

While there is no doubt that traditional commercial licensing databases are rich in features and well-supported, their strict pricing models, cumbersome licensing terms, and high total cost of ownership (TCO) make enterprises want to adopt lower-cost open source solutions . In some ways, open source databases offer the same or even better functionality at a lower cost. Migrating from a commercial database to an open source database can provide businesses with significant cost savings in licensing and support.

PostgreSQL is an enterprise-grade, feature-rich open source database system that is highly reliable and performant, ideal for real-time and mission-critical applications. Amazon Aurora is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Aurora is fully compatible with MySQL and PostgreSQL, enabling existing applications and tools to run without modification. It triples performance and increases scalability, durability, and security over a typical PostgreSQL database.

Migration from a traditional SQL Server database can be time-consuming and resource-intensive, and any migration involves three main steps: moving the schema, migrating the data, and modifying the client application. As we can see in the image below: When migrating a database, you can use the Amazon Schema Conversion Tool (SCT) in conjunction with Amazon Database Migration Service (DMS) to automatically migrate the database schema and data, but when migrating the application itself, you often need to do more Work, including rewriting application code that interacts with the database, and migrating T-SQL code into PL/pgSQL is complex, time-consuming, and risky.

Babelfish for Aurora PostgreSQL is a new feature of the Amazon Aurora PostgreSQL-compatible version that understands Microsoft SQL Server's proprietary SQL language, T-SQL, and supports the same communication Moving to Aurora will require less effort, allowing for a faster, less risky, and more cost-effective migration.

Babelfish supports T-SQL and SQL Server behavior by supporting Aurora PostgreSQL's Microsoft SQL Server data types, syntax, and functions. Note, however, that Babelfish does not provide 100% complete support for T-SQL, there are still some differences and limitations , and manual transcoding is required in some cases.

This article will list and demonstrate some high-frequency and common typical code conversion cases to help you complete the migration work more efficiently and quickly.

2. Environmental preparation

Before starting our demonstration, assuming that in your working environment, there is already a SQL Server source library ready to be migrated, then in addition, you also need to set up the following related components:

Babelfish Compass

This is an open source syntax evaluation tool for SQL Server migration to Babelfish, available for download on GitHub . It can run on Windows and Linux platforms, requires Java environment support, the current version is v2022-04

Babelfish for Aurora PostgreSQL

Babelfish has released the first version 1.0.0 in the fall of 2021, and the current version has been updated to 1.2.1, and the corresponding Aurora PostgreSQL version is 13.6. You can follow the official documentation to create a Babelfish for Aurora PostgreSQL cluster environment in just a few simple steps. In the configuration process, you need to pay attention to the selection of database migration mode, and if there is Chinese data, please select "chinese_prc_ci_as" in the collation rule

So far, an environment with SQL Server source and Aurora PostgreSQL target and migration assessment tool is ready. Next, please refer to this blog, you can generate a Babelfish migration assessment report in just a few minutes.

3. Code conversion

3.1 Conversion assessment

The assessment report generated by the Babelfish Compass tool is a guide for assessing the content and effort of the migration, and you can write the SQL code conversion content one by one according to the items listed in it that need to be modified.

The Summary section of the assessment report lists the T-SQL syntax feature compatibility statistics for migrating SQL Server sources to Babelfish targets, including supported, unsupported, semantic review, manual review, and ignorable items. The most important thing is the content of unsupported features. If these SQL statements contain unsupported features, if they are not modified, most executions in the Babelfish for Aurora PostgreSQL environment will report an error: "'???' is not currently supported in Babelfish" , and some other SQL statements will not actually take effect although no error is reported.

In the evaluation report, we can view the SQL classification statistics of these unsupported features. The evaluation report in the figure below lists each type of SQL statements that do not support features. It shows that the DDL script used in our case is not supported in Babelfish The main features include adding constraint statements to tables, Merge statements, modifying databases, modifying roles, and executing some system stored procedures.

3.2 Conversion principle

Babelfish provides an additional endpoint for the Aurora PostgreSQL database cluster to understand the SQL Server wire-level protocol and common SQL Server statements. After the migration, you can still use the same T-SQL development tools and drivers to connect to the TDS port to complete related development. You can also use a native PostgreSQL connection to do development on the PostgreSQL side, and call from the T-SQL side. This compatibility mode can help us solve most of Babelfish's compatibility issues with T-SQL.

Select conversion mode: As mentioned above, for some unsupported SQL statements, we can choose to rewrite them in T-SQL, or modify them in PostgreSQL and call them from T-SQL. The principle of conversion depends on the connection development mode of the application. For example, if a .net application is connected to TDS-side development, then the preferred conversion mode is to convert in T-SQL. If it cannot be rewritten on the T-SQL side or there is a performance problem after the modification, you can try to modify it in PostgreSQL.
Code readability: There may be several ways to rewrite the SQL statement to be modified. Simplicity, efficiency, and readability are always preferred. For example, in most cases, using the Case statement is easier to understand than using ..Then.

3.3 Simple code conversion

Code transformations that fall into this category are characterized by simple modifications, but often account for the vast majority of all SQL statements listed in evaluation reports that do not support features. Such transcoding work generally requires only masking relevant options, commenting out entire statements, or simply modifying them. The reason for this modification is due to the feature difference between PostgreSQL and SQL Server or the limitation of Babelfish. Some options or operations in SQL Server are not supported by Babelfish and will not affect the function execution and can be ignored directly. Although this kind of SQL statement is simple to rewrite, it can achieve the same effect.

Before the demonstration, let's look at the structure of the two tables that will be used next:

 create table dept(
    deptno int NOT NULL PRIMARY KEY,
    dname varchar(14),
    loc varchar(13)
) 

create table employees (
    empno int NOT NULL PRIMARY KEY,
    ename varchar(10),  
    job varchar(9),  
    mgr int,  
    hiredate datetime,  
    sal money,  
    comm money,  
    deptno int
)

ALTER TABLE..CHECK CONSTRAINT

original sentence

 ALTER TABLE [dbo].[employees]  WITH CHECK ADD CONSTRAINT [FK_DEPT] FOREIGN KEY([deptno])
REFERENCES [dbo].[dept] ([deptno])
GO
ALTER TABLE [dbo].[Employees] CHECK CONSTRAINT [FK_DEPT]
GO

Modified statement

 ALTER TABLE [dbo].[employees] ADD CONSTRAINT [FK_DEPT] FOREIGN KEY([deptno])
REFERENCES [dbo].[dept] ([deptno])
GO

illustrate:

CHECK CONSTRAINT statement is not supported in Babelfish to enable table constraints, ALTER table constraints will be automatically enabled after adding constraints
When adding constraints in Babelfish, WITH CHECK/NOHECK options are not supported for constraint checking on existing data
This kind of unsupported ALTER TABLE statement is the most common in the migration process. Generally, after modification, create a new table and constraints on Babelfish, and then import the data of the table. The constraints of the table will automatically check the imported data to ensure that the data constraints are valid.

ALTER ROLE..

original sentence

 ALTER ROLE [???] ADD MEMBER [NT AUTHORITY\SYSTEM]
GO

Modified statement

 /* ALTER ROLE [???] ADD MEMBER [NT AUTHORITY\SYSTEM]
GO */

illustrate:

Currently Babelfish only supports the dbo user in the user database, you cannot create a user with lower privileges, such as read-only privileges on some tables
Most of these statements are DDL statements poured out after the user logs in to the SQL Server source with operating system privileges. You can directly comment out the masked statements.

ALTER DATABASE..

original sentence

 ALTER DATABASE [???] SET RECOVERY FULL 
GO

Modified statement

 /* ALTER DATABASE [???] SET RECOVERY FULL 
GO */

illustrate:

Babelfish does not support the ALTER DATABASE syntax. Aurora PostgreSQL is a fully managed database that restricts some database modification statements. These statements can be directly commented and masked.

ALTER AUTHORIZATION ON object

original sentence

 ALTER AUTHORIZATION ON [dbo].[employees] TO  SCHEMA OWNER 
GO

Modified statement

 /* ALTER AUTHORIZATION ON [dbo].[employees] TO  SCHEMA OWNER 
GO */

illustrate:

Babelfish does not support the creation, modification and deletion of AUTHORIZATION, and can be directly commented and shielded

EXEC sys.sp_addextendedproperty

original sentence

 EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'编号' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'dept', @level2type=N'COLUMN',@level2name=N'deptno'
GO

Modified statement (modified on the PostgreSQL side)

COMMENT ON COLUMN dept.deptno IS '编号';
illustrate:

Babelfish does not support the use of the system stored procedure sp_addextendedproperty to add a description to a field. You can directly comment to shield this SQL statement, and connect to the PostgreSQL side to use comment to add a field description

OBJECTPROPERTY

original sentence

 select name from sysobjects where objectproperty(id, N'IsTable') = 1 and name not like N'#%%' order by name
select * from sysobjects where id = object_id(N'temp_tableSpaceInfo') AND objectproperty(id, N'IsUserTable') = 1

Modified statement

 select name from sysobjects where xtype in ('U','IT','S') and name not like N'#%%' order by name
select * from sysobjects where id = object_id(N'temp_tableSpaceInfo') AND xtype='U'

illustrate:

Babelfish does not support the built-in metadata function OBJECTPROPERTYEX, which can be appropriately rewritten according to SQL semantics

SET ROWCOUNT

original sentence

 CREATE PROCEDURE [dbo].[P_Rowcount]
@id int
as
set nocount on
set rowcount @id 
begin
select * from employees order by empno
end
GO

Modified statement

 CREATE PROCEDURE [dbo].[P_Rowcount]
@id int
as
set nocount on
begin
select top (@id) * from employees order by empno
end
GO

The second modification

 CREATE PROCEDURE [dbo].[P_Rowcount]
@id int
as
set nocount on
begin
select * from employees order by empno offset 0 rows fetch first @id rows only;
end
GO

illustrate:

Babelfish does not support the SET ROWCOUNT statement to return the specified number of rows, which can be appropriately rewritten according to SQL semantics. From the example, we can see that there are many ways to rewrite. In complex business scenarios, the choice should be made in terms of code readability and performance impact

CURRENT OF

original sentence

 CREATE PROCEDURE [dbo].[P_CurrentOf] AS
BEGIN
  DECLARE @empno int
  DECLARE NoResponce CURSOR FOR
    SELECT empno FROM employees;
  OPEN NoResponce;
  FETCH NEXT FROM NoResponce INTO @empno;
  DELETE FROM employees WHERE CURRENT OF NoResponce;
END
GO

Modified statement

 CREATE PROCEDURE [dbo].[P_CurrentOf] AS
BEGIN
  DECLARE @empno int
  DECLARE NoResponce CURSOR FOR
    SELECT empno FROM employees;
  OPEN NoResponce;
  FETCH NEXT FROM NoResponce INTO @empno;
  DELETE FROM employees WHERE empno = @empno;
END
GO

Remark:

Where Current Of statement allows you to update or delete the last record fetched by cursor, Babelfish does not support Current Of statement, you can select variables according to the context semantics of the SQL statement

IDENTITY

original sentence

 SELECT IDENTITY(INT,1,1) AS rowid,* INTO #tmp
FROM employees
ORDER BY empno

Modified statement

 SELECT row_number() over () as rowid, * INTO #tmp
FROM employees
ORDER BY empno

illustrate:

Babelfish does not support the IDENTITY function, which is used to insert an identity column into a new table in a SELECT statement with an INTO clause, which can be rewritten using row_number() over ()

3.4 Complex code conversion

Compared with the simple code conversion introduced earlier, the following SQL statements will be more complicated and modified. At the same time, you also need to carefully review the relationship between the contexts in the SQL statement to ensure that the modified statement and the original statement have the same effect.

MERGE

Before this case demonstration, create two source and target tables used by MERGE

 create table source
(
   id      int not null primary key ,
   country varchar(20) null,
   city    varchar(20)
);

insert into source
  (id, country, city)
 VALUES
  (1, 'RUSSIA',  'MOSCOW'),
  (2, 'FRANCE',  'PARIS'),
  (3, 'ENGLAND', 'LONDON'),
  (4, 'USA',     'NEW YORK'),
  (5, 'GERMANY', 'BERLIN'),
  (6, 'BRAZIL',  'BRASILIA');

create table target
(
   id      int not null primary key ,
   country varchar(20) null,
   city    varchar(20)
);

insert into target
  (id, country, city)
 VALUES
  (1, 'JAPAN',   'TOKYO'),
  (4, 'USA',     'DENVER'),
  (7, 'CHINA',   'BEI JING');

original sentence

 MERGE INTO target AS C2
USING source AS C1 
ON C2.id = C1.id
WHEN MATCHED
   THEN UPDATE 
      SET
         C2.country = C1.country,
         C2.city = c1.city
WHEN NOT MATCHED
   THEN INSERT (id, country, city)
            VALUES (C1.id, C1.country, C1.city);

Modified statement

 begin
update target set country = C1.country, city = C1.city from (select id, country, city from source) C1 where target.id = C1.id;
insert into target (id, country, city) select * from source as C1 where not exists (select id from target where id = C1.id);
end
go

The second modification (modification on the PostgreSQL side)

 with upsert as
(update target c2 set country=c1.country, city=c1.city 
 from source c1 where c1.id=c2.id
  RETURNING c2.*
)
insert into target select a.id, a.country, a.city 
from source a where a.id not in (select b.id from upsert b);

The third modification (modification on the PostgreSQL side)

 insert into target (id,country,city) select id,country,city 
from source
on conflict (id)
do update set country=excluded.country,city=excluded.city;

illustrate:

MERGE is a commonly used data merge update statement. Babelfish does not support MERGE statement. Generally speaking, it can be split into multiple DML statements in T-SQL according to SQL semantics, and it can also be equivalently rewritten on the PostgreSQL side.
PostgreSQL currently does not support the MERGE statement, which can be implemented using the UPSET or CONFLICT statement. The execution cost of INSERT ON CONFLICT is less than that of the UPDATE statement.

FULLTEXT full text search

Babelfish does not support full-text search of SQL Server, and does not support the following statements and system stored procedures

 CREATE、ALTER、DROP FULLTEXT CATALOG
CREATE、ALTER、DROP FULLTEXT INDEX
CREATE、ALTER、DROP FULLTEXT STOPLIST
exec sp_fulltext_database 'enable';

The Amazon Aurora PostgreSQL-compatible version adds support for the pg_bigm extension. The pg_bigm extension provides full-text search functionality in PostgreSQL. This extension allows users to create 2-grams (double groups) to improve full text search speed. The following example demonstrates how to enable full-text search functionality through an extension on the PostgreSQL side

 set search_path=dbo;
create extension pg_bigm;

CREATE TABLE fulltext_doc (doc text);
INSERT INTO fulltext_doc VALUES('Babelfish助力SQL迁移 成本优化');
INSERT INTO fulltext_doc VALUES('Babelfish助力SQL迁移 性能优化');
INSERT INTO fulltext_doc VALUES('Babelfish助力SQL迁移 提升使用体验');
INSERT INTO fulltext_doc VALUES('Babelfish助力SQL迁移 中提供 2-gram 全文搜索功能的工具');
INSERT INTO fulltext_doc VALUES('Babelfish助力SQL迁移 中提供 3-gram 全文搜索功能的工具');

CREATE INDEX fulltext_doc_idx ON fulltext_doc USING gin (doc gin_bigm_ops);
alter table fulltext_doc owner to dbo;

After the full-text search is successfully set, it can be called through T-SQL on the TDS port

SWITCHOFFSET

original sentence

 SELECT CAST(SWITCHOFFSET(TODATETIMEOFFSET(SYSUTCDATETIME(),'+00:00'),'+08:00') AS DATETIME)

Modified statement (create custom function on PostgreSQL side)

 CREATE OR REPLACE FUNCTION dbo.f_get_cst()
RETURNS sys.datetime AS $$ 
  BEGIN
      RETURN cast(timezone('Asia/Shanghai',now()) as sys.datetime);
END;
$$ LANGUAGE plpgsql;

illustrate:

Babelfish does not support time zone offset built-in functions such as SWITCHOFFSET and TODATETIMEOFFSET. You can create a custom function on the PostgreSQL side and call it through T-SQL on the TDS port to achieve the same function

XML method

Before the demonstration of this case, create a table related to xml parsing

 create table t_xml_test (
  id int,
  country  nvarchar(max),
  industry nvarchar(max)
); 

insert t_xml_test values(1, 'China', 'Manufacturing and foreign trade business');
insert t_xml_test values(2, 'USA', 'Financial and Bioindustry');
insert t_xml_test values(3, 'Russia', 'Resource export');

original sentence

 create procedure p_xml_test
 @xml xml
as
begin    
   set nocount on
   select * from t_xml_test
   where id in (select imgXML.Item.value('id[1]','int') from @xml.nodes('/root/country') as imgXML(Item)); 
   set nocount off
end
go

Modified statement (first create a custom function on the PostgreSQL side to parse the XML)

 CREATE OR REPLACE FUNCTION xmlQueryID(in_xml xml) 
RETURNS TABLE (id text) 
AS $$
DECLARE
BEGIN
  RETURN QUERY
    select * from (
      WITH xmldata(data) AS (VALUES (in_xml::xml))
      SELECT xmltable.*
      FROM XMLTABLE('/root/country' PASSING (SELECT data FROM xmldata) COLUMNS id text)) as foo;
END;
$$ LANGUAGE plpgsql;

Connect the TDS port to modify the SQL statement in T-SQL and call the custom function created on the PostgreSQL side

 create procedure p_xml_test
 @xml xml
as
begin    
   set nocount on    
   select * from t_xml_test
   where id in (select * from xmlQueryID(@xml)) ;
   set nocount off
end
go

When calling the stored procedure test in T-SQL, the query result shows that the xml parsing is normal and the data is displayed correctly

illustrate:

Babelfish does not support methods for parsing XML data, including VALUES, XML.NODES and other methods. You can create custom functions on the PostgreSQL side and call T-SQL on the TDS port to complete the parsing of XML data.

4. Summary

Through the previous case introduction, we have shown you some of the most common conversion methods that do not support feature SQL when migrating SQL Server using Babelfish. Currently, the Babelfish for PostgreSQL project continues to move forward and the version is constantly updated. Each new release adds significant functionality, including increased syntax compatibility and support for SQL Server native functionality. It is recommended that you frequently check Babelfish's feature support notes when planning and implementing your SQL Server migration, and use the latest feature support to complete your code conversion. At the same time, on October 28, 2021, Amazon Cloud Technology officially announced the launch of the Babelfish for PostgreSQL open source project. This move enables users to utilize Babelfish on their own PostgreSQL servers.

For more detailed code conversions for migrating SQL Server to Amazon Aurora PostgreSQL, please refer to the official migration manual , but please note that these conversions are rewritten on the PostgreSQL side, and you need to consider how to call them on the T-SQL side.

Click to learn about the panorama of Amazon cloud technology database services , and start the trial immediately

Author of this article

Tang Xiaohua

Amazon cloud technology database solution technical expert, more than 20 years of experience in the database industry, responsible for technical consulting and solutions based on Amazon cloud computing database products. Focus on cloud relational database architecture design, testing, operation and maintenance, optimization and migration.

Towards a Cloud-Native Database - Code Practices for Accelerating SQL Server Migration with Babelfish

1 Introduction

2. Environmental preparation

3. Code conversion

3.1 Conversion assessment

3.2 Conversion principle

3.3 Simple code conversion

3.4 Complex code conversion

4. Summary

Author of this article

亚马逊云开发者

引用和评论

翰德 Hudson 携手亚马逊云科技，基于 MCP Agent 重塑智能招聘新范式

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

【TVM教程】为 ARM CPU 自动调度神经网络

DNS服务器地址大全

JManus - 面向 Java 开发者的开源通用智能体