12
头图

As the jewel in the crown of basic software, database technology has always been the focus of developers' attention. This attention is so high that it almost naturally bridges the gap between the academic and industrial circles, so that every important paper on database technology may result in the emergence of a group of companies worth billions of dollars.

In recent years, looking at the entire database industry, cloud databases have gradually become the focus of attention. According to Gartner, Inc, by 2022, 75% of all databases will be deployed or migrated to cloud platforms, and only 5% have considered returning to local. IDC believes that by 2025, more than 50% of the world's databases will be deployed on public clouds; in the Chinese market, this data is even more exaggerated, reaching more than 70%.

So the question is, if cloud databases, or cloud native databases, are definitely the next frontier. So what are its current main technologies and development directions? How should we view the development trend of cloud-native databases? Babelfish, released by Amazon Cloud Technology in 2020, may give us some inspiration.

Babelfish, an underestimated blockbuster release

Babelfish will be released on re:Invent in 2020 and announced by Amazon Cloud Technology CEO Andy Jassy.

To put it simply, Babelfish is a plug-in for the cloud database Amazon Aurora PostgreSQL, which makes Aurora compatible with applications written in Microsoft SQL Server.

Babelfish has just been released, and there are many engineers on Youtube making videos expressing their incomprehension. Since the emergence of cloud databases, related migration services can be seen everywhere in the industry. Almost every public cloud enterprise can provide related migration services, but most of them are for Oracle. There is a US company called Enterprise DB that specializes in migrating services from Oracle to PostgreSQL. Related agent layer and SQL language conversion tools are emerging in endlessly.

In fact, Amazon Cloud Technology itself has related migration services, such as Amazon Schema Convertion Tool for architecture migration and Amazon Database Migration Service for storage migration.

So, what is the significance of Babelfish's existence? Does adding a layer of proxy increase back-end processing costs?

In fact, only migrating architecture and storage is not complete, and applications built on top of the database have not yet been migrated. In the Babelfish service scenario, applications built on Microsoft SQL Server use T-SQL to interact with the database, which is completely different from PostgreSQL. If you want to migrate the application simultaneously, unless you rewrite this part.

This also makes database migration a very rare action in the industry, not everyone does not want to (after all, no one can guarantee that the initial architecture selection will always be correct), but the cost is too high.

This kind of migration cost, we can feel it through a set of more general migration plan:

Compared to this heavy migration, if the database is inherently compatible, is it too convenient? This is also the main significance of Babelfish's existence.

And many people underestimate Babelfish, probably because they only see its commercial significance, but not the technical difficulty.

Many features of Oracle and PostgreSQL are the same, and conversion is still difficult; switching to T-SQL and PostgreSQL is even more complicated. The synchronization conversion of the database should pay attention to many very complicated details, including the conversion of query language, the conversion of stored procedures, the conversion of static cursors, the conversion of triggers, and so on.

Sébastien Stormacq of Amazon Cloud Technology once pointed out in a blog post that in T-SQL, the MONEY type has four decimal places precision, while PostgreSQL has only two decimal places precision. This slight difference may lead to rounding errors and affect downstream Processes (such as financial reporting) have a significant impact.

He said: "In this case, Babelfish will ensure that the semantics of the SQL Server data type and T-SQL functionality are preserved: we created a MONEY data type to behave as expected by the SQL Server application."

Babelfish's solution is to use the hooks method to implement in the PostgreSQL built-in engine, exposing itself as a different database (otherwise you can only modify the code of many core areas of PostgreSQL). Its architecture diagram is as follows:

The subtlety is that Babelfish has realized the mutual call between T-SQL and pgSQL through the extended development of part of the executor of the database kernel. In other words, the newly written PostgreSQL code can call the SQL Server code written by the previous application. For friends who have written stored procedures, this feature has the same name as Babelfish, with a "sci-fi" color. Even if the hardest core implementation has been used, Babelfish has not been fully compatible, and some functions and syntax such as ADD SIGNATURE have not yet been implemented. An official Amazon engineer said: "SQL Server has been in development for more than 30 years, and we don't want to support all features immediately. Instead, we focus on the most common T-SQL commands and return the correct response or error message."

This also explains the difficulty of similar migration accelerator development, and also confirms why the open source route is the most suitable for Babelfish development, because open source allows enough developers to participate in the product iteration.

In the same way, a development project with such a high degree of difficulty is unlikely to be insignificant. On the contrary, it may be one of the most important releases of Amazon Cloud Technology in 2020.

era of database fragmentation is really here?

Amazon's release in the field of cloud computing has repeatedly guided the development direction of the entire industry. For example, Amazon Redshift released in 2012 guided the development of cloud-native data warehouses, and Amazon Lambda released in 2014 guided the development of Serverless (Gartner did not confirm Serverless as a future trend until 2019). Amazon Aurora itself is also a cloud-native database. Pioneer products.

If Babelfish also represents a direction, then perhaps the era of database fragmentation has really come.

The database product itself has been controlled by a few companies for a long time because it is too difficult to develop. One of the best, Oracle, has increased the development threshold of commercial databases at a very fast speed.

However, the problems of high prices and high binding risks caused by the "unipolar" development of databases have gradually become unbearable for many companies. Nowadays, various types of databases are emerging in an endless stream, relational, key-value, time series, graphics...it makes it difficult to choose. Another important phenomenon is that most cloud-native databases are developed based on PostgreSQL, but many subsequent research and development forces have not invested in traditional technology concepts such as high performance and high scalability.

The database is compatible, and this feature that is difficult to develop and has nothing to do with performance has become the focus of Amazon Cloud Technology's research and development. In a sense, it also shows that the various types of databases that bloom everywhere will continue to exist in the industry for a long time. People are used to thinking that the long-term development trend of the industry is from single to diversified, and finally, after market screening, returns to single. But this time, the "unipolar" era may really be gone forever.

In addition, in the 2020 Gartner Magic Quadrant report, there are several leading companies in the cloud database field, with Amazon, Microsoft, and Google ranking in the top three.

In 2019, the top three are Microsoft, Oracle, and Amazon. The eldest and the eldest are fighting, but the eldest is gone...

Nowadays, Amazon Aurora, supported by Babelfish, is compatible with Microsoft SQL Server. I am afraid that it is Oracle who is injured. The walls between cloud databases are collapsing, and the difficulty of competition with traditional commercial databases is further increasing.

Taking advantage of the era of fragmentation, Amazon, which released Babelfish, has naturally become the new leader in the cloud database market.

written at the end

The database industry is far from reaching the end, and there will be no so-called end. However, the advantages that cloud-native databases can obtain are not limited to the database itself. For example, Amazon Aurora Serverless provides elastic scaling services, Amazon Aurora Global Database improves global data synchronization capabilities and business continuity, and Amazon DevOps Guru introduces machine learning into application management. This is "joint force", which stretches the database experience on the cloud to a whole new dimension.

In the field of cloud database, this "joint force" will dominate the next market structure.

On November 30th, 2021 re:Invent will come again. Adam Selipsky will debut as the new CEO of Amazon Cloud Technology. It is believed that the trend of the cloud database market will become clearer.


思否编辑部
4.4k 声望117k 粉丝

思否编辑部官方账号,欢迎私信投稿、提供线索、沟通反馈。