At the end of 2021, which has just passed, many technology fields have reviewed the achievements of the past year and the development trends of the next year. As an open source project that has attracted much attention in the database field, OtterTune also released a "2021 Database" review report at the end of the year.
According to the report, as the habits of developers have changed, PostgreSQL has become the first choice of developers for new applications due to its high reliability and rich features. In the past year, the database community is still keen on the "Benchmark performance test battle", which has also triggered an investment and fundraising competition for database companies, and some established companies have been acquired or bankrupt... In short, a series of orders The jaw-dropping change can be described as "wonderful".
The following are the main parts of this report, let's take a look!
PostgreSQL's dominance "stands out"
In the past year, developers' conventional wisdom has shifted: PostgreSQL has become their new application of choice.
It is reported that as early as 2010, the PostgreSQL development team shifted to a more aggressive release schedule, releasing a new major version (H/T Tomas Vondra) every year (PostgreSQL is open source, of course).
Compatibility is a distinguishing feature of PostgreSQL over many systems today. This compatibility is achieved by supporting PostgreSQL's SQL dialect (DuckDB), the wire protocol (QuestDB, HyPer), or the entire front end (Amazon Aurora, YugaByte, Yellowbrick). This advantage has attracted many large companies to join in - last October, Google announced the addition of PostgreSQL compatibility to Cloud Paner; also in October, Amazon announced the conversion of SQL Server queries to Aurora PostgreSQL's Babelfish feature.
An indicator to measure the popularity of a database is the "DB-Engine rankings" database engine rankings. While the ranking isn't perfect and the score is somewhat subjective, it's still a reasonable approximation for the top 10 systems.
According to the "DB-Engine rankings" database engine list, as of December 2021, PostgreSQL is the fourth most popular database for developers (after Oracle, MySQL and MSSQL), and PostgreSQL has further decreased in the past year The gap with MSSQL.
Another trend to consider is how often PostgreSQL is mentioned in the online community (which provides another signal for what people are talking about in the database).
According to Andy Pavlo, founder of PostgreSQL, (by downloading 2021 annotations in the database and counting the frequency of PostgreSQL database names, cross-referencing the list of each database learned from the database database, and sorting out the abbreviations, Such as Postgres → PostgreSQL, Mongo → MongoDB, ES → Elasticsearch), and then calculate the top 10 DBMSs:
Although this ranking is not scientific (as there is no sentiment analysis of reviews), it also clearly shows that PostgreSQL is mentioned more frequently than other databases. There are often developer posts asking what DBMS to use for a new application, and community members' answers are almost always PostgreSQL.
Andy Pavlo also expressed his opinion on this trend:
First, it's a good thing that relational database systems have become the go-to for "startup website" applications. This shows the staying power of the relational model that Ted Codd (the father of the relational database) started in the 1970s. Second, PostgreSQL is a great database system. Although it also has known issues and dark corners, just like every DBMS. However, with so much attention and energy, PostgreSQL will only get better in the years to come.
Benchmark performance test "big dogfight"
Reports show that the "Benchmark" test results are not well-liked among different database vendors in 2021.
Vendors looking to prove their systems were faster than their competitors date back to the late 1980s. That's why the TPC was created to provide a "nonpartisan" forum to moderate. But with the TPC's influence and popularity waning over the past decade, people now find themselves caught up in a new wave of database "Benchmark" melee.
This year, there have been three intensifying "street fights" around the Benchmark test.
Databricks vs.Snowflake
Previously, Databricks announced that the company's new Photon SQL engine set a new world record for 100TB TPC-DS. Snowflake hit back immediately, saying their database was 2x faster than that and that Databricks was running Snowflake by mistake. Databricks countered that their SQL engine provided better execution and price/performance than Snowflake.
Rockset vs.Apache Druid vs.ClickHouse
ClickHouse also previously announced that they are very cost-effective compared to Druid and Rockse. Don't worry: In response, Imply ran a series of tests on the new version of Druid and declared victory. Later, Rockset also joined in, saying that its real-time analysis performance was better than the other two.
ClickHouse vs.TimescaleDB
At the same time, Timescale smelled "blood" and immediately "entered the war." They presented their Benchmark results and took the opportunity to point out weaknesses in ClickHouse technology. Since then, discussions about third-party Benchmark testing have become a hot topic on Hacker News.
In response to this phenomenon, Andy Pavlo commented: In the previous Benchmark turf war, the database community "bleed too much". As a member who has also participated in this game, I have gone up a lot because of this, so now I can say with certainty - it is not worth it! Because cloud database management systems have so many moving parts and tunable options, it is often difficult to determine the real cause of performance differences. Real applications are not just running the same queries one after the other, the user experience when ingesting, transforming and cleaning data is just as important as the raw performance data. As the related comment published earlier, "Only the elderly care about the official data of TPC".
Big data: big data, big investment
According to relevant data, since the second half of 2020, the number of venture capital rounds worth at least $100 million has been steadily increasing. In 2020 alone, there were 327 of these mega deals (less than half of all VC deals). As of January 2021, there are over 100 VC funding rounds worth over $100 million.
In 2021, a lot of investment money is going to database companies. In the transactional database space, CockroachDB leads the fundraising competition with $1.6 million, and as of December 2021, the company has raised $278 million. At the same time, Yugabyte also completed a $188 million Series C financing. PlanetScale, the hosted version of Vitess, closed a $20 million Series B round, while DataStax also raised $37.6 million in a VC round for its Cassandra business.
While these numbers are eye-opening, the analytical database market is even hotter than that. In September 2021, TileDB raised an undisclosed amount of funding, and Vectorized.io raised $15 million for its Kafka-compatible streaming platform. StarTree directly announced the completion of a $24 million commercial Apache Pinot project financing. Subsequently, matviews-on-steroids DBMS Materialize announced they raised $60 million in a Series C round, Imply raised $70 million for its Apache Druid-based database service, and SingleStore raised $80 million in 2021 USD, which brings them one step closer to an IPO.
In early 2021, Starburst Data raised $100 million for its Trino system (formerly PrestoSQL). Another secretly formed DBMS startup, Firebolt, announced that it raised $127 million for its new cloud data based off of ClickHouse. New company ClickHouse. Inc. has also raised a staggering $250 million...
Nonetheless, the above fundraising falls short of Databricks, whose biggest source of funding was a $160 million raise in August 2021, which also left everyone else jaw-dropping.
In this regard, Andy commented: We are in the golden age of database, there are many good choices. Investors are looking for database startups that could be the "next snowflake" IPOs that raise more than previous database startups (like Snowflake, which sold for less than $100 million before the D-Series) USD. Starburst closed a $100 million funding round in less than three years of existence...) There are a lot of factors related to funding now, but there is more money coming in these days.
In Memoriam: Remembering or "Remembering"
In the past year, there were also some regrettable things. For example, last year we also "farewell" some friends in the database field.
ServiceNow acquires Swarm64
Swarm64 started out as an FPGA accelerator for running analytical workloads on PostgreSQL. The company then moved to become a software-only accelerator using extended PostgreSQL. But they have failed to gain traction, especially compared to other well-funded cloud data warehouses. After being acquired by ServiceNow, Swarm64 has not had any follow-up news about Swarm64 products.
Splice Machine Bankrupt
Splice is launching a hybrid (HTAP) DBMS that combines HBase for transactional tasks and Spark SQL for analytics. They then pushed to provide a platform for operational/real-time ML applications. But due to the dominance of dedicated OLTP and OLAP systems, all-in-one hybrid systems have failed to make headway in the database market.
Private equity firm acquires Cloudera
Over the past few years, MapReduce and Hadoop technologies have become less trendy, and Cloudera has lost the same appeal in the cloud data warehouse market. Most of the original engineering teams at Impala and Kudu have left the company, although these projects are still in development, with new releases being released. The stock has fallen below its IPO price since 2018. It remains to be seen whether the company's new investors can turn the company around.
"It's always sad to see a database project or company fail, but that's the nature of the database industry." Andy said that open source might help a DBMS outlive the company that created it, but that's not always the case. Due to its complexity, the database requires a full-time staff to fix bugs and add new features. Moving source code rights and soon-to-be-defunct DBMS control to an open source software foundation (like the Apache Foundation or CNCF) doesn't mean the project will have a miraculous resurgence. More database companies are expected to go bankrupt next year, leaving many unable to compete with the major cloud providers and the aforementioned well-funded startups.
Challenges and Opportunities
The post-epidemic era will be a difficult period for many people, but with challenges there will be opportunities.
Back in 2015, Oracle co-founder Larry Ellison was the fifth richest man in the world. But things are unpredictable, and in 2018 the billionaire fell from the list to No. 10.
Fortunately, things took a turn for the better. In December 2021, Larry Ellison made $16 billion in one day as Oracle's stock rose for the second time in the past 20 years, and the company's performance was better than expected. He directly surpassed Google Larry Page and Sergey Brin to return to the fifth richest man in the world.
This story, I believe, is undoubtedly exciting and touching for the database community and everyone. This is especially true for Andy, who also sees the database as the most important part of his life outside of his family.
In conclusion, databases are an industry of extraordinary resilience and innovation, and we all look forward to 2022 being a bright year.
Check out the full report: https://ottertune.com/blog/2021-databases-retrospective/
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。