1 The value of smart assistants and dialogue systems
Intelligent assistants are a booming industry, and users' demands are very strong. Currently, they are far from being able to satisfy users.
- Users at the first level have very high requirements for efficiency, and they won’t say two sentences for things that are done in one sentence.
- The second level of users needs to be very caring, intelligent, and similar to a personal assistant role.
- At the third level, the intelligent assistant serves as an outlet for talking and meets the emotional needs of human beings.
The emergence of smart devices in recent years has confirmed the "intelligence" trend predicted by Kevin Kelly. Leading by Apple, it has created a variety of products that are popular around the world, such as earphones and watches, and innovative smart devices such as robots have also appeared. In the future, various industries such as smart home, personal wear, and smart travel can clearly foresee the explosive growth of "intelligent" devices.
The British market research company Juniper Research predicts that by 2023, devices equipped with smart assistants will increase from 2.5 billion at the end of 2018 to 8 billion.
Xiaobu Assistant is a technology company that integrates hardware and software-OPPO's heavyweight product in the intelligent assistant track. Since it went live in 2019, it has grown very rapidly. It has reached 250 million device coverage and 130 million monthly active users.
As the only intelligent assistant of OPPO, Xiaobu Assistant covers many intelligent devices under OPPO in addition to mobile phones. At the same time, there are also Xiaobu assistants in different app entrances in mobile phones, such as alarm clocks, shops, and so on.
Xiaobu Assistant has 260+ skills, such as weather, system control, and small chat interaction. It is a product that meets the broad needs of users.
The core system in Xiaobu Assistant, the dialogue system, covers three types of typical dialogue system scenarios.
- Task-based: The answer is accurate, the domain is limited, and the goal is to satisfy the user with the simplest interaction. For example, set an alarm clock.
- Question and answer type: The answer is broad, the domain is limited, and the goal is to satisfy the user with the simplest interaction. Such as Encyclopedia.
- Small-chat type: The answer is broad, the field is open, and the dialogue round is the goal.
2 Xiaobu Assistant Dialogue System Business Architecture
The industry is usually based on the typical Pipeline approach, and seldom uses the E2E approach.
- ASR: Voice signal input, including multi-modal expansion in the future, converted into text.
- NLU: Input text, after model classification, extraction, converted into structured intent and slot.
- DM: Maintain the dialogue state of the context, update to the new dialogue state through the dialogue state above and the structured input of this round, and output the action.
- NLG: Enter action to convert it to human-understandable text.
- TTS: Input text and convert it into human voice audio.
The aforementioned pipeline of the baseline satisfies ideal and simple scenarios. During the practice of Xiaobu Assistant's business architecture, there are two things to think about.
First, how to integrate multiple fields. Xiaobu Assistant can support a lot of skills, how to integrate skills distributed in different fields?
Second, under a multi-domain business structure, how can performance and cost be traded off?
Xiaobu Assistant has made the following changes to the pipeline.
- The newly added rank can be understood as the top-level DP (dialog policy) of multiple fields.
- RG, which is equivalent to NLG, includes resource acquisition in addition to text generation.
- Added Post rank to verify resources.
The multi-domain integration problem is based on the principle of domain autonomy as much as possible, divided into two sub-problems, multi-domain result sorting and cross-domain result integration.
- Sort results in multiple fields: rank is responsible. At the same time, in order to simplify the complexity, the specific actions are not currently sorted, but the DM sources are sorted.
- Integration of cross-domain results: DM is responsible. Input the cross-domain intent to DM, and perform cross-domain fusion logic processing.
The position of the rank module is the key to the compromise between effect, performance and cost.
- Rank position is at the end, merged with post rank, that is, multi-domain and multi-path recall-sorting method. Slow resources need to be fully recalled, and there are problems in performance and cost.
- After the Rank position is in NLU+DST, it is the reinforcement learning MDP architecture. Rank is equivalent to the top level DP. At this time, it only contains NLU and DST information. Considering the long-term effect ceiling, it is hoped that more features will participate in the ranking.
- Rank chooses the middle position to compromise effect, performance and cost. In terms of performance and cost, since the top action has been selected, a large number of slow resource requests are filtered. In effect, action has brought out features other than resources, ensuring the greatest possible effect improvement.
3 Xiaobu Assistant fluency optimization practice
In the user experience of Xiaobu Assistant, fluency optimization is a key point. From the stage when the user wakes up to speak and finally sees the result, the real perceptual delay is in the middle, that is, from the end of the user's speech to seeing the result. The goal of fluency optimization is this segment of user-perceivable RT.
The fluency of Xiaobu Assistant encounters top problems:
- The three-party resource execution on the server takes the largest proportion of time. Among the server-side 580ms time consuming, the three-party resource execution takes the largest proportion, 80%+.
- Server-side speech recognition takes the second place in time-consuming. End-cloud speech recognition takes a long time, and tail recognition takes 380ms-600ms.
- Client-side rendering interaction can be more concise. Part of the vertical skill client interaction can be more concise, and execution can be faster.
The first two parts account for a large proportion, and external reasons cannot effectively control the shortening of RT, which poses a great challenge to fluency optimization.
Take a few short stories as an example to compare how to optimize time-consuming performance when the three parties are out of control. Little A and Little B rush to answer questions. You can answer them yourself, or you can rely on foreign aid. The analogy dialogue system is time-consuming and cannot be controlled by the outside. You can only control your own strategy to help Little A defeat Little B.
Story 1:
Little A: Look at my ctrl c+v Dafa, and concurrently request foreign aid that I am good at, and read the notes at the same time.
Little A: Maybe Little B asks for foreign aid serially, it's definitely faster than him.
After a few rounds, Little B is faster than Little A most of the time!
Little B: I have already analyzed that No. 1 foreign aid is the slowest and can only cover an additional 20% of the answers.
Little B: Hierarchical levels of notes and foreign aid. Even if 20% of the cases are serial, 80% I don’t need to wait stupidly. Of course it’s faster than Little A.
Xiaobu Assistant can reduce the RT by about 100ms through the speed layering, and the layer can be flexibly arranged.
Story 2:
After Xiao A copied Xiao B's strategy, his reaction was still half a beat.
Little A: Have you analyzed the characteristics of foreign aid?
Little B: That's right. I found that Foreign Aid 3’s replies accounted for 40%, so I didn’t even think about it when I got a question, and sent it to Foreign Aid 3 in advance. Of course, the response was a bit faster than you.
Through the pre-launch, Xiaobu assistant reduces the RT by about 20ms.
Story 3:
Little A is not reconciled, secretly determined to think of a coup.
Little A: Before the host has finished speaking, I can predict his complete problem. If I can't sneak away, I will be far ahead?
Sure enough, Little A can answer faster than Little B!
But the good times didn't last long. Little A found that foreign aid often gave wrong answers, causing Little A to deduct a lot of points.
What is the reason why Xiao A thought?
A: It turns out that the foreign aid believes that the forecast request is a formal request. The formal request should be understood as "who publishes books at the same time as X", but actually understood as "who publishes books at the same time as Y".
A: I didn't expect the state to introduce side effects. Is the prediction useless?
It is predicted that it will be adopted in the industry in search engines and other systems, which requires a compromise between user experience and cost.
The main difficulties in Xiaobu Assistant have a greater impact on the architecture. Prediction needs to split a request into a sequence of requests, N-1 predicted informal requests, 1 formal request, downstream cannot know whether the request is a formal request, and N-1 informal requests must be processed to avoid introducing state side effects. , Otherwise it will cause incorrect results due to multiple rounds of state confusion.
Prediction Plan 1 withdrawal status of each prediction request
The difficulty of implementation is that the sequence is difficult to guarantee, and distributed transactions are required to ensure that the following steps are in one transaction.
- Dialogue state rollback undo
- Dialogue business logic dialog
- Conversation status write write
Forecast Plan 2 status after the formal request is completed
The implementation difficulties are:
- The business logic is intrusive, and the maintenance of each design business state needs to be modified to implement try, confirm and cancel.
- The request is enlarged, and the back-end write request increases by 1/N, and it is usually predicted that the request N is relatively small.
Forecast Plan 3 -Transform to stateless
- The write state persistence is unified upstream, and the state read and write are carried through the request protocol. The size of the dialogue state is less than 1kb.
- Some of the services that cannot be transformed into stateless services will come through predictions and judgments and return to reject.
The overall solution is suitable for the data volume of Xiaobu Assistant, the structure is simpler and more elegant, and it is more friendly to performance and usability. The overall hit rate of the activated skills was 42.3%, and the time-consuming gain was 173ms.
Further abstraction, the traditional dialogue system is a synchronous system, and the real world is the process of asynchronous dialogue. Real-world dialogues do not come and go. There are asynchronous phenomena such as overlapping, interrupting each other, interspersing multiple times, and being silent at intervals. For a dialogue system, the rhythm of input and output is not fixed.
The industry uses a full-duplex solution to solve the above problems. Xiaobu Assistant builds a rhythm controller module to convert the external asynchronous rhythm in the simplex and duplex scenarios into the synchronous processing of the downstream system. Contains input rhythm control strategies such as prediction and pre-launch mentioned in this article; output rhythm control strategies such as paving dialogue and active dialogue not mentioned in this article. In order to solve the problem of the radio ending prematurely and the radio can't stop, the input rhythm control strategy of judging stop and judgment is currently in practice.
4 Xiaobu Assistant Microservice Practice
Xiaobu Assistant has undergone three phases of architecture evolution:
- In the initial stage, the prototype system of core functions will be launched quickly in 2019.
- In the fast-running stage, under the main contradiction between the function and the rapid expansion of the team, the microservice architecture is upgraded.
- In the charge stage, experience and technical ability are deeply cultivated.
The microservice architecture was upgraded, and five areas and six businesses were quickly and independently iterated from the business architecture. The speed was increased by 472% and significant results were achieved. This chapter will not introduce the domain design and business architecture evolution, but will focus on the microservice technology architecture.
The micro-service architecture greatly increases the complexity of the architecture, and adopts the technical architecture that is most suitable for its own business and team to ensure system quality and controllable implementation costs.
The first step in quality assurance is to be able to catch faults.
- Fault discovery: Thanks to the relatively complete infrastructure conditions of OPPO Cloud, the creation of three-dimensional monitoring is relatively low.
- Fault location: In the context of the dialogue system, the full-link debug platform that aggregates points in seconds has greatly improved the efficiency of troubleshooting.
The second step of quality assurance is to adopt a high-availability architecture to reduce the probability of failure and provide self-healing and manual recovery capabilities.
failure probability:
- Live-active in the same city: Reduce the probability of global failure caused by a single computer room failure.
- Separation of light and heavy: Xiaobu assistant system focuses on algorithms. Algorithm services are usually more fragile than engineering services. The scope and probability of failures are reduced through isolation and deployment. Algorithm service adopts sidecar unified service management capability to reduce the probability of failure.
automatic recovery:
- Current limit rejection: The self-developed service overload rejection strategy protects upstream and downstream services to avoid overwhelming. The system can automatically recover after abnormal traffic.
- Fuse downgrade: The fuse downgrade of the external system that is uncontrollable by a third party needs to be more complete, especially for long-term connection services. After the external system is abnormal, the backup system can be immediately switched to perform automatic recovery.
Manual recovery:
- Dual-active in the same city: A large area of unknown faults originating from a single unit link can be quickly restored through manual flow cut to control the impact area.
- Gray-scale rollback: originates from failures introduced during the release process, and gray-scale release controls the impact of failures. Rollback provides manual quick recovery. The microservice itself is relatively easy to implement, and it is difficult to implement the gray-scale release and rollback of the data, and the data in the process.
The third step of quality assurance, whether it is normal conditions or abnormal recovery process of data consistency and correctness, how to implement the assurance at low cost?
- Business transparency: online data consistency issues are handled centrally to achieve business transparency. The write state is concentrated to the aggregation service, the business service is stateless, and the consistency of storage middleware such as dual-active redis synchronization in the same city is only concerned by the aggregation service.
- Business statelessness: The protocol carries the state data transparently to the downstream business service to read the state, so that the business service is stateless. Most stateless business service interfaces are idempotent, and a small amount of reliance on third-party non-idempotent services makes their own interfaces unable to be idempotent.
- Unit release: offline data is unidirectionally released to online instances through the management background of the central unit. In the dialogue system scenario, metadata is mainly released, full version control of sqlite snapshot is performed, and data consistency and correctness in release and rollback are guaranteed at low cost.
Why use sqlite in the selection of technical architecture?
The abstraction of the problem is the automatic release of management back-end data to online services to achieve database version control and gray-scale release.
- Data support version control
- Data is released to each unit according to the research and development process
- Data takes effect and rolls back according to the gray scale of the instance
The business features of Xiaobu Assistant are as follows:
- The amount of metadata in the management background is small, tens of MB.
- Timeliness of data release within minutes.
- Single table full replacement release.
Under this business feature, the selection of sqlite is considered in the following three aspects:
One data version control
plan one : record revision
1) There is a schema revision table. Create a separate version record table and store the associated original table field values.
2) There is no schema revision table. Use a unified revision table to achieve serialized storage of historical versions.
3) Revise the original table. Increase the version field in the original table.
The disadvantages of the three different revision schemes are that the business is not transparent and the table structure needs to be changed.
plan two : flyway
It is suitable for devops publishing, not for managing background data publishing.
plan three : SQLite snapshot
DB full snapshots are used for version control, and snapshots are created by creating sqlitedb and data importing, which is equivalent to using sqlite as an intermediate serialization method.
Advantage:
- Transparent management background business
- Online service business transparency
Suitable for full table/full database version control, not suitable for data row version control.
Two data are released by unit
scheme one : binlog
The progress of data release is difficult to control, and it is impossible to release only for development and testing, not for production.
scheme two : mq synchronization
There are more additional data synchronization/release research and development costs.
Plan three : Snapshot file synchronization
Rely on object storage to complete synchronization between snapshot data units.
The advantage is that it can directly reuse the file release plan.
Suitable for minute-level releases, not for second-level releases.
Three data are released and rolled back by instance grayscale
plan one : memory cache + trigger loading
After the data in the database is updated, the instance is loaded with restart trigger, mq trigger, etc.
There is no problem with normal publishing. When an abnormal situation occurs, such as rolling back instance 1, because the database does not have V2 data, it will affect the correctness of data loading when the instance is restored.
plan two : mq release and rollback
Similar to mq synchronization, additional data release research and development costs will be introduced.
scheme three : embedded database of the instance
SQLite is embedded in the instance, and the V2 version snapshot can be loaded to realize recovery.
Advantage:
- Instance isolation is the strongest.
- Version snapshots support fast data rollback and recovery.
Suitable for small data volume, not suitable for large data volume.
In summary, sqlite selection is suitable for small data volume, minute-level, and fully controlled key metadata to achieve low-cost, high-quality release and rollback.
5 Summary
This article introduces the background and value of intelligent assistants and dialogue systems, as well as the engineering practice of Xiaobu Assistant. The technical points are summarized as follows:
Author profile
Xiao OPPO Dialogue System Backend Expert
Responsible for building the back-end system of Xiaobu Assistant Dialogue System from 0 to 1, as well as engineering architecture planning and research and development.
For more exciting content, please scan the QR code to follow the [OPPO Digital Intelligence Technology] public account
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。