Once upon a time, how to backup/restore data when people changed phones was a headache. The emergence of iCloud has solved the backup management of the iPhone very beautifully and deeply rooted in the hearts of the people. Now iPhone users have no pressure to change their mobile phones.
For every database user, database backup/recovery is also a rigid need. Although backup management is often overlooked, an incident is a major event. For enterprises, there is nothing without data. Data backup and recovery is to prevent disasters before they happen. powerful means. According to statistics, about 11% of the cost and effort in an enterprise is allocated to data backup management.
Currently, TiDB users can use BR to do full backups and continue to do incremental backups, and use these backups to restore data, but it can only restore to a few points in time when the backup was made, and the granularity is not fine enough. While using TiCDC or binlog can record incremental events and restore data to any point in time after full backup, BR itself does not natively support PiTR (point-in-time recovery). Even if this function is implemented, the current solutions in the industry are also faced with complex, fragile, high-cost and other problems. Can DBAs do data recovery as simple as changing iPhones?
The pCloud team developed the pCloud project in this TiDB Hackathon - a SaaS service fully hosted in the cloud, which can host database backup/restore in one stop, supports Time Travel (restore to any point in time), and is based on S3 storage at the bottom layer. , the storage cost is extremely low. The pCloud team won the "Second Prize" and the "Best Market Potential Award Specially Sponsored by Yunqi Capital" for this project.
pCloud is a very interesting project. Dongxu directly brought the goods on the stage. Putting aside the great appeal of his personal scene, from a practical point of view, pCloud has done a really good job. Although Tunghsu only showed the product effect and talked about the business model, I know that the underlying implementation of this project is still very challenging. This also gives another reference for the students participating in the next TiDB Hackathon: a project, it is easier for everyone to pay attention to the technology itself, but if we are making a product or a SaaS service, the understanding of users and business is also very important. The essential. Therefore, even if you feel that you don't have much understanding of TiDB and can't write too hardcore programs, you can make breakthroughs from other directions.
——Judge Tang Liu
Team: How is pCloud formed?
The pCloud team consists of 4 players, including our CTO Huang Dongxu who is participating as a player for the first time. He has been a judge in previous competitions, but he has always been itching to be a contestant, thinking that the Hackathon competition is more hard-core every year. This time his main role in the team is to contribute project ideas, draw product prototypes, and guide team development as a product manager.
Huang Dongxu: When I decided to compete, I first went to Long Heng (TiMatch captain) and asked him if I have a great idea, do you want to do it? He said that he had already decided to do TiMatch, but he still helped me find three team members, Luancheng, Wang Hao and Juncen.
Luan Cheng: At that time, when Dongxu came to me, he described his idea to me and said that he needed a front-end engineer. Then I thought of a front-end boss Wang Hao in my Zhihu colleagues. He was in Singapore, and they were there at that time. It should be Christmas, and I agreed to join in just in time. In pCloud, I am mainly responsible for the construction of the front-end website. Juncen and I have been doing backup and recovery in PingCAP. The main task this time is to connect existing tools (such as BR, etc.) to pCloud and do some adaptation work.
Huang Dongxu: Although it is my first time to participate, as a senior judge, I know that the completion of the project is actually quite important, so finding a good front end is half the success.
(Dongxu: I specially asked a designer to design a product logo, a good logo is half the success~)
Origin: A business model brainstorming
Dongxu Huang: Around the beginning of 2020, we had a brainstorm on the scalable business model of TiDB. Generally speaking, the business model of database is basically to sell a service or something, but I vaguely feel that open source is something very similar to ToC. Is it possible to use some ToC ideas to see the commercialization of TiDB?
In the past, the business model of commercial databases was basically to charge service fees. The more important the protected scenario, the more service fees that can be charged. But this idea has undergone major changes in the era of "open source + cloud": first, Cloud will become a particularly popular thing; second, open source databases have surpassed closed source databases, and their maturity has basically met the needs of large For most core business scenarios, open source database users have become a very large group; third, Cloud has the advantage of standardizing delivery and lowering the threshold for payment. In those days, WeChat, Alipay and other fast payment channels appeared on the ToC side, which made the payment threshold extremely low, and only the subscription models such as iQiyi came into being.
Therefore, based on these three premises, you will find a new idea: find a common path in the user journey, first, regardless of the core or not, optimize the user (developer) experience, and then use the cloud infrastructure to reduce costs, Finally, use the traffic portal + PLG to take the viral route. The key to the pCloud project is - first, whether enterprise users can find a low-threshold payment channel, such as SaaS; second, this model must be very light, not particularly expensive, and must find a very automated entry point . Because once it involves manual provision of services, this matter will be abolished; third, the user group of this thing must be wide enough. Database itself is a thing with a particularly wide user group, and there are actually some things in the Database that are also used by a special group. Broad, that is backup recovery.
In the field of ToC, there are several prerequisites for the psychology of consumers having an impulsive consumption: first, you can immediately understand what this thing is; second, it seems that I must need it; third, this thing must be Very cheap. These are all available, and consumers may have the psychology of impulsive consumption. Therefore, we borrowed the concept of iCloud for mobile phones to create an illusion for everyone, I must need this thing. Because everyone needs iCloud, we'll translate this concept into the DBA realm. If you are an ordinary consumer, it is easy to be attracted and paid for by such a concept.
Challenge: It doesn't look hardcore, but the engineering is complex
Luan Cheng: Overall, I think the most difficult thing is to integrate these resources and block out many technical details, presenting a more simplified and easier-to-use interface to users. Before this, the backup PiTR, TiUP and SaaS services are some very fragmented things, but in this effect show, we have to integrate them together, which actually takes some effort. For example, when TiUP is integrated into PiTR, there are actually many components behind it to run backups, and then write incremental data to S3.
Chen Yu: I have talked about a similar project myself. If their software is really used, a lot of manpower and material resources will be invested in the implementation. And the best business model should be that everything is self-service to customers, and customers can solve most problems by themselves. That way the whole thing is easily scalable, but many products on the market today are so complex that there may be more implementation teams than engineering teams.
Many people think that the biggest flaw of this project is that it doesn't look so hard-core. From the point of view of technology and community scores, I did not give this project a particularly high score. But in the end, I was given a special award because of the concept of this product - "simple and easy to use". This reminds me of snowflake and Databricks war of words, Databricks is always talking about performance, snowflake emphasizes the ease of use of the product.
Huang Dongxu: To be honest, I think that many teams or projects are trying to do everything. To meet the diverse needs of customers, the products they design are very complicated. But I wanted to try a concept in this project - I know very well who my client is, I will clearly draw a boundary, and if the complexity of the boundary is exceeded, it is not my client. These customers in the boundary will use the product with a very smooth user journey and a very low price. In short, it's my client who's going to have a blast using it. I believe that if such a customer can make the volume, it will also be a good business model.
What are the highlights of the pCloud project?
Huang Dongxu: There are actually several hard-core points in this project: First of all, the backup of an enterprise is definitely not a full backup, but an incremental backup. Incremental backup will face some problems, such as going back to any point in time, and at the same time not destroying the isolation of things at this point in time. Through the carefully designed PiTR technology (full + incremental), we can Save full backup at low cost and support data recovery to any time; the second is high availability, we have to assume that it is always connected to the external network. If the network fails, what should you do with your local cache. If the network is connected, you can directly see the status of these clusters in the data center in real time on the SaaS service. Third, to meet global security compliance, data supports asymmetric encrypted storage. Personal users don't really care about information security, but enterprise users will definitely need this function, which can also be used as one of the commercialized services of this product.
In general, although this project is not so hard-core compared to changing the database kernel, we have used a lot of fashionable technologies, and it is actually a product with high engineering complexity. When forming a team, I even wanted to find a finance classmate to help me design a more reasonable billing model, but then I thought it was a bit too complicated, so I ended up using a relatively simple technique. But if this product is to continue to dig deeper, billing is actually a very complicated module.
Who are the hypothetical customers of the pCloud project?
Judge Chen Yu believes that the style of most of the teams in Hackathon is more like an engineer, and they talk a lot about technology, but the product design concept is relatively less. In contrast, Tunghsu's style of introducing the pCloud project is in the whole Hackathon. It is particularly different in the competition, and even in the competition, he has already thought about the future commercialization model of the entire project, which greatly increases his goodwill as an investor.
Huang Dongxu: There are some small and medium-sized start-up companies in China. They may think that the RDS in the cloud is too expensive, so they buy a virtual machine, deploy TiDB or MySQL on it, and run some enterprise-level applications. Because of budget control, they generally don't hire a full-time DBA, and they don't want engineers' time to be spent on these things, so they may be the target customers of this product. In the first phase of business model I envision, pCloud itself will become a "bridge to the cloud". The data backup is stored in the cloud using S3. The most beautiful thing is that S3 is a cloud-neutral standard protocol. Every cloud will have the object storage service of the S3 protocol. Therefore, the business model of the second stage needs to move towards: the business model of the channel, Two things need to be done at this stage:
Open source (not Open Source, but open source that reduces expenditure), supports more databases as data sources of PiTR (extends services to MySQL, PG, Oracle, etc.);
Provides one-click recovery to cloud databases (eg TiDB Cloud, Aurora, RDS, Snowflake).
The business model of the channel is to attract traffic. This stage must be the stage where various cloud database vendors compete for users. If pCloud works well, it will be a natural traffic pool with many ways to play. A good example is the acquisition of mLab by MongoDB in 2019, which is probably a similar logic.
Of course, this stage is not permanent. I said that the end of the database must be in the cloud, so the template of the story in the next stage of pCloud is probably FiveTran + Rockset, which will become a data computing platform in the cloud, but pCloud has a better foundation (with full Data), this stage needs to introduce cloud ETL and Serverless to reduce the cost of data processing and analysis on the cloud, there is no ceiling in this stage (refer to Snowflake).
When community users need to go to the cloud in the future, this is actually a more ingenious tool. They can directly restore data to TiDB's DBaaS through pCloud, and do not even need tools such as importing data and migrating data.
How far is the pCloud project from the hypothetical business model?
Huang Dongxu: It needs to be divided into two parts. The first is the maturity of PiTR, and the second is the maturity of pCloud. pCloud is quite simple, the key is the maturity of PiTR. I'll make sure it's on the PingCAP Roadmap.
Luan Cheng: Yes, it is mainly the quality, performance and supported scenarios of PiTR itself. We all know that 1TB data and 1GB data must be different orders of magnitude of difficulty. The key point is to see whether the stability and performance of the tool can meet the requirements. I remember that there was a question in the FAQ during the competition. What if the data volume is very large and the IDC bandwidth for export is full? We need to do some processing similar to data compression, the main difficulty is in this aspect. Chen Yu: Backup may always be just needed, but there are indeed few cloud backups that are easy to use and easy to use. This thing does not have to be done in the TiDB ecosystem. In essence, it is a general requirement for a database. If someone wants to, they can set up a company independently to do the cloud backup function of TiDB or any other cloud database on the cloud. This is where I see the potential of this project.
Feelings and gains from participating in the TiDB Hackathon
Although he won the second prize in the Hackathon competition, Dongxu still felt fearful and fearful, lamenting that this year's players were all too strong.
Huang Dongxu: I think next year I should still be a judge, not a contestant. Originally, I wanted to do some hardcore projects, but when I saw so many very hardcore projects in the preliminary round, I felt that I was overwhelmed. When I become a judge in the future, I can also translate some special professional technical language into human words when the judges discuss, and make it clear to other judges.
This Hackathon is a bit regretful. There are many functions we want to do that we haven't done yet, such as key management. But because Hackathon is a two-day competition, I need to get a demo out quickly, but my inner product manager soul is burning.
Wang Hao: Because I am not a PingCAP employee or a member of the database community, I feel that the students in the entire community are very hard-core and passionate. I also feel that the whole community is very concerned about the hardcore thing, and everyone seems to think that changing the kernel is a cool thing. I think if you want to expand the group of participants and get more ideas, you can set up one or two awards for some projects that seem to be soft, such as some PM-like idea projects, so that more people outside the community can also Get involved and contribute your own stuff.
Luan Cheng: I personally think it is difficult to compare Hackathon projects. For example, it is difficult to judge between a very hard-core performance tuning project and a very novel idea. I hope that the next organizing committee will judge these projects separately and set up awards independently.
Yu Juncen: I feel very happy to participate in this year's competition. I can write code freely and make my own idea. This is the most attractive part of Hackathon. I personally hope that the next Hackathon will be more lively and delicious.
Chen Yu: I think the biggest feeling this year is that the project quality is generally higher than last year. As a bystander, I feel that the proportion of participants other than PingCAP in this Hackathon has greatly increased, and there are more diversified ideas collisions. In fact, TiDB Hackathon has been done year by year, and the good ideas will definitely be completed, which forces players to find more hard-core ideas if they want to stand out in the competition in the future.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。