头图

This article was shared by the engineer "Jon" of the WeChat client technical team. The original title "Windows WeChat: The Evolution of the Message Database Architecture" has many revisions.

1 Introduction

What this article shares is that the WeChat client team optimizes and transforms the architecture of the IM local database on the Windows side of WeChat based on the daily usage scenarios and data analysis of WeChat users, by separating important and non-important data and adopting a reliable sub-database strategy. , and finally get a technical transformation plan with good practical effect.
图片

The following are related technical articles, and interested readers can read them together:
Wechat client SQLite database damage repair practice WeChat mobile full-text retrieval optimization road WeChat mobile full-text retrieval polyphonic problem solution WeChat iOS latest full-text retrieval technology optimization practice WeChat local database crack version (including iOS, Android), only for learning Research [Attachment Download]

study Exchange:

  • Introductory article on mobile IM development: "One entry is enough for beginners: developing mobile IM from scratch"
  • Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK (click here for alternate address)

(This article has been published simultaneously at: http://www.52im.net/thread-4034-1-1.html )

2. Background description

WeChat's Windows client has grown steadily since its launch in 2014. Over time, the amount of messages accumulated locally by many users is increasing.
图片
The original local IM database design adheres to the principle of "easy to use and easy to manage", and stores all messages received by the user in the "same SQLite data file" locally on the user's current client. (Author's note: WeChat does not save chat records, and the chat content is only stored on the user's mobile phone, computer and other terminal devices.)

3. Current issues

Due to the shortcomings of the initial local database design scheme, with the increasing use of WeChat and the accumulation of more and more messages, many technical problems have gradually been exposed.
3.1 Question 1: Data query is slow. With the passage of time, the data gradually increases. When the amount of data becomes larger and larger:
1) The query and insertion efficiency of the database will be affected;
2) Even if there is an index in the message database, the query efficiency of the index will also decrease.

From a file system perspective, database files grow page by page.

Because the long-term use of WeChat will gradually accumulate the amount of messages and increase the size of the database, it will also lead to more serious fragmentation, which will further affect the read and write efficiency under the mechanical hard disk. The most intuitive impact on users is that switching chats becomes very stuck. This problem is especially serious for heavy users, and even clicks to chat will freeze.

3.2 Problem 2: Large storage files With the passage of time, the amount of messages gradually accumulates, and the volume of database storage files is also increasing, which significantly occupies user storage space.

3.3 Problem 3: Disk file corruption Accidental corruption of disk files may also lead to data loss. Because all messages are put into one database file, it is like putting all the eggs in one basket. The database file may also be damaged due to bad sectors in storage, unexpected power failure of the computer, and bugs in sqlite itself. In the event of corruption, the user may lose message data. Even if there is a DB recovery mechanism, there is no guarantee that all historical records can be recovered.

When this happens, the impact on the user is huge, because the chat history may be gone!

PS: WeChat mobile also has similar problems. If you are interested, you can read "WeChat Client SQLite Database Damage Repair Practice".

4. Reason analysis

4.1 Overview The above problems of large database storage files and slow queries are caused by the continuous increase of message data. But the growth of the number of messages is unavoidable, so is there a way to control the growth rate and control the size of the database? We analyze from two directions: message situation, daily usage scenarios

4.2 Analysis 1: Message situation The IM messages in WeChat can be divided into three categories:
1) Single chat message;
2) Group chat messages;
3) and subscription account/service account messages (collectively referred to as public account messages).
In order of importance of the message:
1) Single chat/group chat message: This is the user's private message, which is deleted or lost and cannot be recovered, causing the greatest loss to the user;
2) News of the official account: As long as you follow the official account, you can pull and read it, which belongs to the public news, so it is less important to users.

In terms of message size:
1) Based on the analysis of the message size data of the test account, we found that the public account messages, which account for a small proportion of the total number, occupy more than half of the database space;
2) After analyzing the message types of the test accounts: web card messages are the main type of official account messages, and the average message body size is dozens of times that of text messages.

4.3 Analysis 2: Analysis of Daily Application Scenarios As we all know, we use WeChat every day to send and receive messages, or browse recent messages. For earlier news, we generally rarely take the initiative to browse. The earlier the message, the lower the probability of viewing. So: in most scenarios, we want the most frequently accessed messages to be unaffected by old data.

5. Solutions

5.1 Overview In view of the above problems and combined with the above analysis, we have evolved and optimized the architecture of the local SQLite database on the Windows side of WeChat from the following aspects.

The main optimization contents and means involved are:
1) Sub-library transformation;
2) Establish a message index;
3) Message volume optimization;
4) Improve database robustness.

Below we will introduce each one in detail.

5.2 Reconstruction of sub-databases Based on the above analysis, the public account messages are first divided and stored in a separate database, which is isolated from the user's common messages, and can also greatly reduce the size of the common message database.

Based on the analysis of daily usage scenarios, most of the old data is read at a low frequency, so the read and write efficiency in the recent period should be improved. For the above situation, we adopt a scheme of dynamically dividing the database in time and space. The initial default value is that each database stores messages for half a year, and after the time expires, a new database is created to store them. For most usage scenarios, we only need to read and write the latest database to meet the requirements. If we need to browse earlier messages, we can open the previous database for reading. In addition to the time dimension, we also consider the division of the space dimension: if the size of ordinary messages exceeds the threshold within half a year, a new database will be created for storage, so that the size and data size of each database will not be too large, which can increase the number of messages in the recent period of time. read and write efficiency.

图片

5.3 Indexing messages For the broadest usage scenario - viewing messages per chat, this scenario requires building an index for each chat session. The index scheme here refers to the Android side: that is, convert each chat into a numerical ID, thereby reducing the length of each index and improving the efficiency of reading and writing the index. (For the complete database structure of WeChat's mobile SQLite database, please refer to: "WeChat Local Database Cracked Version (including iOS, Android), for study and research only [Attachment download]")

In addition, we also extract some frequently accessed content into a field and add an index. For example, the subtype of the message (this is a serialized field in the old database) has no index, but this field is often used, so it is proposed to be a column separately, and an index is added to facilitate the search for messages by type.

5.4 Message volume optimization
Obviously, there will always be more and more messages in IM, but how to reduce/compress the volume of message data without affecting the efficiency of reading and writing is also our optimization direction. From the above data, some messages are large in size, which has exceeded the size of each page of the database (Page Size). The database stores data by page, and the Page Size is the data that a page of the database can hold. If a piece of data cannot fit on one page, you need to use the overflow page, and put the extra data that cannot fit in the overflow page. There can be multiple overflow pages. At this time, if you read this data, you need to read all the overflow pages, which will increase the IO consumption. If the data is compressed, the message body can be compressed into a page that can fit, and the use of overflow pages can be reduced, which can increase the IO performance. SQLite database overflow page structure:
图片
(The picture above is quoted from page 308 of the book "The Definitive Guide to SQLite") PS: I also found the electronic version of the book "The Definitive Guide to SQLite" for you, please download it from the following attachment: The Definitive Guide to SQLite (2nd edition, 2010)-52im.net.pdf.zip (3.61 MB)
图片

However, compression requires CPU resources, and choosing an algorithm that can balance performance and compression ratio is the key here. After comparing the Benchmark of the compression algorithm and measuring the compressibility of the message body, a high-performance compression algorithm was finally selected: lz4.

After analyzing the data of the test account, the size of different types of messages varies greatly. Generally speaking: the length of the text message is not particularly large, but the message of the webpage card type will be larger in size. Due to different message lengths, the compression ratios obtained are different, and if the text length is too short, it is meaningless to compress. Therefore, after analyzing the length of the message body, compression, and compression performance, it is finally determined to compress the web card, etc., under the premise of lower performance consumption, the comprehensive compression rate can reach 40%, reducing the number of IOs.
图片

5.5 Improve robustness If the database file is damaged due to external reasons, it will have a great impact on the experience. Reducing the damage rate and reducing the data loss caused by damage is also the direction of our improvement. After dividing the database according to the time dimension, it is equivalent to dispersing and storing messages according to time. The latest database is responsible for reading and writing recent messages, and the rest of the databases only need to support browsing and viewing messages as needed. For old databases: it can be loaded on demand, which reduces the read and write to the database and reduces the chance of these databases being damaged. Once a database is damaged, even if it cannot be recovered, all messages will not be lost, but only the messages in the corresponding time period of the database will be lost, which can also reduce the loss caused by partial database damage. In the single-database architecture used in the early days, because the data will be accumulated more and more, the database volume will continue to grow, and it is difficult to do backups.

After the database is divided, the size of each database becomes smaller, so database backup becomes more feasible. Because the latest database has frequent message reading and writing, the probability of damage is much higher than that of the old database, so the latest database is backed up regularly. Under the default configuration, we will back up the latest database at regular intervals, which is a complete copy of the latest database. If the latest database is damaged during reading and writing, it will try to restore from the backup data first. If the recovery is successful, the data from the backup to the recovery period will be lost at most, further reducing the loss caused by the damage.

6. Optimize the comparison

After comparison, for an original message database in the test account, the size can be reduced by nearly half after compression, and the number of overflow pages and the number of records that need to use overflow pages are also reduced by more than half.
图片
For read and write performance, compared to before compression, the read and decompression performance after compression is nearly 10% higher than before.
图片

7. Future Outlook

In the future, our WeChat client team will continue to study the practice related to database repair, continue to pay attention to the performance data related to the database, improve reliability, and create a better user experience!

Appendix: Summary of more IM articles from major manufacturers

[1] Original technical article by the WeChat team:
"Technical Challenges and Practical Summary behind the Hundreds of Billions of Visits in WeChat Moments"
"IM Full-Text Retrieval Technology Topic (2): Solution to the Problem of Multi-Phonetic Words in Full-Text Retrieval on WeChat Mobile Terminal"
"WeChat Team Sharing: Technical Practice of High-Performance Universal Key-Value Component of WeChat for iOS"
"WeChat Team Sharing: How does WeChat for iOS prevent group explosions and APP crashes caused by special characters? 》
"WeChat Team Original Sharing: Technical Practice of Memory Monitoring System in WeChat for iOS"
"iOS background wake-up combat: WeChat payment to the account voice reminder technology summary"
"WeChat Team Sharing: Principles and Application Scenarios of Super-Resolution Technology for Video Images"
"WeChat Team Sharing: Deciphering the Technology Behind WeChat's Daily Billions of Real-Time Audio and Video Chats"
"WeChat Team Sharing: Those Pits Filled by WeChat Android Version of Small Video Coding"
"IM Full-Text Retrieval Technology Topic (1): The Road to Full-Text Retrieval Optimization of WeChat Mobile Terminal"
"Optimization of Synchronization Update Scheme of Organizational Structure Data in Enterprise WeChat Client"
"WeChat Team Disclosure: The Ins and Outs of WeChat Interface Stuck Super Bug "15...""
"How the Super IM WeChat with 889 million monthly active users is tested for Android compatibility"
"An article to get everything about the WeChat open source mobile database component WCDB! 》
"WeChat Client Team Leader Technical Interview: How to Start Client Performance Monitoring and Optimization"
"WeChat Backstage Design Practice of Mass Data Cold and Hot Hierarchical Architecture Based on Time Series"
"WeChat Team Original Sharing: The Bloat and Modular Practice of Android WeChat"
"WeChat Background Team: Optimization and Upgrade Practice Sharing of WeChat Background Asynchronous Message Queue"
"WeChat team original sharing: WeChat client SQLite database damage repair practice"
"WeChat Mars: The network layer encapsulation library being used inside WeChat, which will be open source soon"
"As promised: WeChat's own mobile IM network layer cross-platform component library Mars has been officially open sourced" "Open source libco library: the cornerstone of the background framework that supports 800 million WeChat users with 10 million connections on a single machine [source code download]"
"WeChat New Generation Communication Security Solution: Detailed Explanation of MMTLS Based on TLS1.3"
"WeChat team original sharing: Android version WeChat background keep alive actual combat sharing (process keep alive)"
"WeChat team original sharing: Android version WeChat background keep-alive actual combat sharing (network keep-alive)"
"Technical evolution of Android version WeChat from 300KB to 30MB (PPT presentation) [Attachment download]"
"WeChat Team Original Sharing: Technological Evolution of Android WeChat from 300KB to 30MB"
"WeChat Technical Director Talking About Architecture: The Way of WeChat - The Great Way to Jane (full speech)"
"WeChat Technical Director Talking About Architecture: The Way of WeChat - The Great Way to Simple (PPT Lecture) [Attachment Download]"
"How to Interpret "WeChat Technical Director Talking About Architecture: The Way of WeChat - Avenue to Simplicity"
"Background System Storage Architecture Behind Massive WeChat Users (Video + PPT) [Attachment Download]"
"WeChat Asynchronous Transformation Practice: The Background Solution Behind 800 Million Monthly Active Users and 10 Million Connections on a Single Machine"
"WeChat Moments Massive Technology Way PPT [Attachment Download]"
"Technical Experiment and Analysis of WeChat's Influence on the Internet (Full Paper)"
"A summary note on the technical architecture of WeChat's background"
"The Way of Architecture: 3 Programmers Achieve an Average Daily Publishing of 1 Billion in WeChat Moments [with Video]"
"Fast Fission: Witness the Evolution of Wechat's Powerful Background Architecture from 0 to 1 (1)"
"Fast Fission: Witness the Evolution of Wechat's Powerful Background Architecture from 0 to 1 (2)"
"WeChat Team Original Sharing: Summary of Android Memory Leak Monitoring and Optimization Skills"
"Comprehensive summary of the various "pits" encountered by the iOS version of WeChat to upgrade iOS9"
"WeChat Team Original Resource Obfuscation Tool: Make Your APK Instantly Reduce 1M"
"WeChat team's original Android resource obfuscation tool: AndResGuard [with source code]"
"The actual combat record of "weight loss" of the WeChat installation package for Android version"
"WeChat Installation Package "Weight Loss" Practical Record for iOS"
"Mobile IM Practice: The iOS Version of WeChat Interface Caton Monitoring Solution"
"Technical Problems Behind WeChat "Red Envelope Photos""
"Mobile IM Practice: A Record of the Technical Solution for the iOS Version of WeChat's Small Video Function"
"Mobile IM Practice: How the Android version of WeChat greatly improves interactive performance (1)"
"Mobile IM Practice: How the Android version of WeChat greatly improves interactive performance (2)"
"Mobile IM Practice: Realizing the Intelligent Heartbeat Mechanism of Android WeChat"
"Mobile IM Practice: Discussion on the Multi-device Font Adaptation Scheme of WeChat for iOS"
Detailed explanation of IPv6 technology: basic concepts, application status, and technical practice (Part 1)
Detailed Explanation of IPv6 Technology: Basic Concepts, Application Status, and Technical Practice (Part 2)
"Interview with WeChat Multimedia Team: Learning of Audio and Video Development, Audio and Video Technology and Challenges of WeChat, etc."
"Tencent Technology Sharing: The Story Behind WeChat Mini Program Audio and Video Technology"
"Interview with Wechat Multimedia Team Liang Junbin: Talking About Audio and Video Technology I Know"
"Tencent Technology Sharing: Technical Ideas and Practices for Interworking Wechat Mini Program Audio and Video and WebRTC"
"WeChat Technology Sharing: WeChat Mass IM Chat Message Serial Number Generation Practice (Algorithm Principles)"
"WeChat Technology Sharing: WeChat Mass IM Chat Message Serial Number Generation Practice (Disaster Recovery Scheme)"
"WeChat team sharing: Kotlin is gradually recognized, the technology early adopter journey of Android version WeChat"
"Deciphering of social software red envelope technology (2): Deciphering the technological evolution of WeChat shake red envelope from 0 to 1"
"Social Software Red Packet Technology Decryption (3): The Technical Details Behind WeChat Shake the Red Packet Rain"
"Decryption of Social Software Red Envelope Technology (4): How Does WeChat Red Envelope System Cope with High Concurrency"
"Decryption of Social Software Red Envelope Technology (5): How the WeChat Red Envelope System Achieves High Availability"
"Decryption of Social Software Red Envelope Technology (6): The Evolution Practice of Storage Layer Architecture of WeChat Red Envelope System"
"Decryption of Social Software Red Packet Technology (11): Decryption WeChat Red Packet Random Algorithm (including code implementation)"
"WeChat Team Sharing: Extreme Optimization, Practical Summary of 3 Times the Compilation Speed of WeChat for iOS"
"Is the IM "scanning" function easy to do? Take a look at the complete technical implementation of WeChat's "scanning and recognizing things""
"WeChat Team Sharing: Thinking on Mobile Software Architecture brought by WeChat Payment Code Refactoring"
"IM Development Collection: The most complete in history, a summary of various functional parameters and logic rules of WeChat"
"WeChat Team Sharing: The Evolution of the 15 Million Online Message Architecture in WeChat Live Chat Room"
"Demystifying the IM Architecture Design of Enterprise WeChat: Message Model, Ten Thousand Crowds, Read Receipts, Message Withdrawal, etc."
"IM Full-Text Retrieval Technology Topic (4): The Latest Full-Text Retrieval Technology Optimization Practice of WeChat iOS Terminal"
"Wechat Team Sharing: How Wechat Backend Does Not Crash Under Massive Concurrent Requests"
"Optimization practice of WeChat Windows IM message database: slow query, large volume, file corruption, etc."

More similar articles...

[2] The technical story behind WeChat:
"Technology Past: WeChat's valuation has exceeded 500 billion, Lei Jun had the opportunity to incorporate Zhang Xiaolong and his Foxmail"
How much did Tencent spend to develop WeChat? Is it technically difficult? Where is the difficulty? 》
"Development Past: In-depth narration from 2010 to 2015, behind WeChat's ups and downs"
"Development Past: The Origin of WeChat's Unchanged Splash Screen Picture for Thousands of Years"
"Development Past: Recording the Story Behind WeChat Version 3.0 (9 months from the release of WeChat 1.0)"
"A WeChat Intern's Self-Report: The WeChat Development Team in My Eyes"
"WeChat's Seven Years Review: After How Many Questions and Bad Reviews Have Been Experienced, Only With Today's Power"
"Sharing by Former Founding Team Members: Inventory of WeChat's Past and Present: The Inevitability and Contingency of WeChat's Success"
"Instant Messaging Entrepreneurship Must Read: Deciphering WeChat's Product Positioning, Innovative Thinking, Design Rules, etc."
"[Technical Brain Hole] Is it technically possible to pull 1.4 billion Chinese people into a WeChat group? 》
"The chicken ribs developed by WeChat in those years and the thoughts it brings us"
"Understanding WeChat: From Version 1.0 to Version 7.0, the Evolutionary History of a Mainstream IM Social Tool"
"Interview with Ma Huateng: First talk about personal experience, management experience, technological innovation, the birth of WeChat, etc."
"Understanding Zhang Xiaolong, the Father of WeChat in One Text: Failed Genius, Subversive, Dictator, and Human Manipulator"

More similar articles...

[3] Alibaba's technology sharing:
"Alibaba Dingding Technology Sharing: The King of Enterprise IM - Dingding's Superiority in Back-end Architecture"
"Discussion on the Synchronization and Storage Scheme of Chat Messages in Modern IM System", "Ali Technology Sharing: Deeply Revealing the 10-Year History of Alibaba's Database Technology Solutions", "Ali Technology Sharing: The Hard Road to Growth of Ali's Self-developed Financial-Grade Database OceanBase" " From Alibaba OpenIM: Technical Practice Sharing of Creating Safe and Reliable Instant Messaging Services" "Dingding - Technical Challenges of a New Generation of Enterprise OA Platform Based on IM Technology (Video + PPT) [Attachment Download]" "Ali Technology Crystallization: "Alibaba Java Development Manual (Protocol) - Huashan Edition" [Attachment Download]" "Heavy Release: "Alibaba Android Development Manual (Protocol)" [Attachment Download]" "The author talks about the background behind "Alibaba Java Development Manual (Protocol)" "Story", "The Story Behind Alibaba's Android Development Manual (Protocol)", "Dried this bowl of chicken soup: from the barber shop boy to the Ali P10 technology giant", "Uncovering the rank and salary system of Ali, Tencent, Huawei, and Baidu" "Taobao Technology Sharing: The Road to Technological Evolution of the Mobile Access Layer Gateway of Hand Taobao Billion Level" "Rare Goods, Revealing Alipay's 2D Code Scanning Technology Optimization Practice Road" "Taobao Live Technology Dry Goods: High Definition, Low Latency "Deciphering the real-time video live broadcast technology", "Ali Technology Sharing: E-commerce IM Message Platform, Technical Practice in Group Chat and Live Streaming", "Ali Technology Sharing: Xianyu IM's Mobile Terminal Cross-End Transformation Practice Based on Flutter", "Ali IM" Technology Sharing (3): Architecture Evolution of Xianyu Billion-level IM Messaging System" "Ali IM Technology Sharing (4): Reliable Delivery Optimization Practice of Xianyu Billion-level IM Messaging System", "Ali IM Technology Sharing (5): Timeliness Optimization Practice of Xianyu Billion-level IM Messaging System" "Ali IM Technology Sharing (6): Optimization of Offline Push Reach Rate of Xianyu Billion-level IM Messaging System" "Ali IM Technology Sharing (VII): Online Xianyu IM , Optimization Practice of Offline Chat Data Synchronization Mechanism" "Alibaba IM Technology Sharing (8): Deep Decryption of DingTalk Instant Messaging Service DTIM Technical Design"
(This article has been published simultaneously at: http://www.52im.net/thread-4034-1-1.html )


JackJiang
1.6k 声望808 粉丝

专注即时通讯(IM/推送)技术学习和研究。