1
头图
In the next ten years, intelligence will continue to penetrate into various fields, changing our lives and work. This article will share the technical practice of Alibaba's front-end intelligent technology and explore the future development direction of UI intelligence.

Recently, I saw a judgment: the past ten years have been a decade of vigorous development of intelligence, but the next ten years will be a decade in which intelligence penetrates into various fields and continues to change our lives and work. Take it for granted.
As dedicated processors for neural network acceleration become standard hardware, DSPs are increasingly entering the consumer electronics field from professional fields, CPU manufacturers acquire FPGA companies, 4nm process and 3D stacking process, and single-cycle multi-instruction issue processors. The mechanism improvement of computing power allows PPA to continuously break through its limits, and is about to push the technological change around intelligence to a climax.
As a group of Alibaba front-ends based on changing the front-end technology engineering system, we look forward to being at the forefront. From the cooperation of Google's TensorFlow team, to the publication of top conference papers and the inclusion of technical radars, to the large-scale promotion and implementation of many achievements in the group and the industry, the Ali front-end intelligent team has always moved forward firmly. In this article, we will select some parts with substantial breakthroughs from the overall technology picture of Alibaba's front-end intelligence, and share our research and practice process. Looking forward to participating in the exploration of front-end intelligence with more front-end peers, and making this direction more abundant, realistic and inclusive.
图片
Ali front-end intelligent technology big picture

Article frame (note: there is a lot of content, you can directly slide to the part of interest to read)

  • Front-end intelligent technology base: data and algorithm technology engineering capabilities are the basis for upgrading
  • End-to-end intelligence: Through the introduction of end-to-end intelligence, help you truly apply data capabilities and algorithm capabilities to front-end technical scenarios
  • Front-end intelligent technology application
  • The development trend of UI intelligence in the future

Front-end Intelligent Technology Base

2021~2022 is a year of obvious changes for the PipCook project. Before that, PipCook was more of an engineering framework for deep learning that is easy to use by the JS community. Many times, it ignored the combination with business and the data itself. value. This year, PipCook mainly relies on the cloud platform to open up business data links and empower front-end data capabilities. At the same time, based on the analysis and modeling capabilities provided by datacook, PipCook deeply analyzes business data, and closely links business around consumer experience issues, and establishes data guidance. Iterative links provide value to the business.

PipCook-Cloud enhances front-end data analysis and intelligent algorithm capabilities

In the traditional production and research structure: product manager, designer, operation, business, server, client, front-end, testing, BI and many other roles, the front-end has always been a functional role that indirectly participates in the business and has no direct impact on the development of the business. However, in order to provide growth for the business, the front end, as a bridge to deliver value to users, must systematically and deeply understand the relationship between merchants and consumers, superimpose the user perspective on the platform perspective and ecological perspective, and establish user operation technology. Ways and means of empowerment, and organically combine them with business operations, platform operations, and ecological operations.
Under this requirement, data analysis and data-based modeling capabilities will be crucial. We hope that through the Pipcook-Cloud project, we can enhance front-end data analysis and intelligent algorithm capabilities, and gradually transform technical output and user behavior data. and business indicators are linked together. Through the indicators associated with the three, gradually systematically and logically draw a complete picture from business goals to product design to technical delivery and user experience. On this picture, indicator-driven and data-supported enable users to have operational capabilities. With technical support, the product delivery value is measured by user experience indicators, and the front end has a complete bridge of user value and emotional connection in the entire business life cycle.
At present, our big data development is mainly carried out around DataWorks related tools. The development language is mainly ODPS SQL, and UDF (user-defined function) based on Python and Java can also be developed. For some data model development requirements, it may also be based on the PAI platform. The current development methods based on ODPS SQL, Python and Java are a bit expensive for front-end students. There are two main reasons. On the one hand, there are certain entry barriers, including language barriers, processing logic arrangement barriers, and understanding of the data itself. On the other hand, the front-end lacks the accumulation of common business data analysis experience (user behavior analysis on the terminal, crowd segmentation behavior analysis, front-end performance analysis, user click heat map analysis, etc.), which makes it difficult to directly reuse some Data analysis capabilities for common front-end scenarios.
At the same time, the current report and data visualization tools are mainly aimed at BI, operation and data PD, and build reports by visually configuring data sources and dragging and dropping chart components. For front-end students, on the one hand, this method lacks flexibility and can only use prepared chart components and capabilities. It is difficult to expand and customize reports that suit their own business. The front-end has natural advantages in visualization capabilities. On the other hand, it is also difficult to provide users with visual reports of our precipitation front-end data analysis scenarios through templates and other means.
Based on the above reasons, PipCook-Cloud focuses on front-end data capabilities and user experience and user growth analysis capabilities. This year, it mainly solved the following problems:

  • Let front-end students understand what data development can do and what value it can bring to the business: If you want front-end students to get started with data development, the first step is to let front-end students understand what data development can do, and at the same time, they can discover business data analysis It can really bring value to the business, so that users have the motivation to get started and use PipCook-Cloud.
  • Let the front-end students clearly and clearly arrange the whole process of their own data development to complete the development purpose: just like when programming a system, you need to have a design for the system first, you can complete the system design with the help of flowcharts, ER diagrams and other tools , Similarly, when writing a data development link, we also need to have a design first, which clearly expresses and orchestrates the entire data flow.
  • When writing data processing logic, front-end students can realize low-threshold and cost-free implementation: Generally, the languages developed for big data systems are SQL and Java. If a JS-based data processing system can be implemented, then for front-end students to implement their own The threshold for processing logic will be greatly lowered.
  • Allow front-end students to quickly complete when they need some necessary machine learning model capabilities: in some scenarios, they cannot implement their own data analysis tasks through deterministic logic, and sometimes they need to use model capabilities to mine data. mode.
  • Let front-end students easily develop business reports based on data to illustrate their own conclusions and help business understand: the last step of data analysis is to produce conclusions. In many cases, business reports and charts are required to clearly explain the conclusions and conclusions of data analysis. Subsequent actions.

图片
In 1.0, we have been working hard to turn PipCook into: enabling front-end engineers to develop reliable machine learning applications; in 2.0, we have made a slight change to this: enabling front-end engineers to develop reliable machine learning applications.
In the past year, we have continued to focus on the deep learning CI/CD capabilities provided by PipCook, providing continuous integration and delivery of model training and data analysis. The 1.0 version of PipCook enables web developers to start deep learning with a relatively low threshold the road. In the process of practice, we also found some problems. For example, it is difficult to install PipCook, and the installation time is long. When PipCook starts training for the first time, it will take a lot of time to install plug-ins. On the other hand, there are also problems such as difficult configuration and difficulty in getting started. , based on the solution of these problems, we launched PipCook 2.0, which has also been open source [08].

DataCook enhances data analysis and processing capabilities

If PipCook-Cloud opens up the bridge between front-end and business data, and provides engineering capabilities for data processing and data development through platform-based productization, then DataCook is the fuel that provides the PipCook platform at the bottom. DataCook provides a front-end data science and machine learning tool library under the JS ecosystem to help front-end students easily analyze their own business data with analytical modeling capabilities and produce valuable analysis results.
At present, DataCook has covered a number of capabilities including data preprocessing, data statistical analysis, machine learning, visualization and cross-platform process deployment, and provides multiple models such as linear regression, logistic regression, decision tree, PCA, etc. For modeling analysis in scenarios, DataCook has also been open-sourced [09].
图片

Terminal Intelligence

Last year, Tao Department began to focus on consumer experience. Based on personalized consumption scenarios, the construction of exploratory product supply and intelligent information expression is an opportunity for end-to-end intelligence. End-to-end intelligence provides intelligent capabilities based on real-time and privacy protection, so that the front-end can use these technical means to continuously optimize the consumer experience.
In the past three years, we and Google's TensorFlow.js team have reached a strategic cooperation in the direction of promoting front-end intelligence and end-end intelligence applications. The front-end technology standard organization W3C has also intensively launched WebCL (see "OpenCL 2.0 Heterogeneous Computing" third edition), WebGL, WebGPU, WebNN, WASM and other front-end intelligent acceleration technologies, especially the Web Neural Network API launched in 2022, targeting neural networks. Network, tensor and graph computation acceleration. Retrieved from: https://www.w3.org/TR/webnn/
Compared with the client, the front-end computing power has not been completely released. From the CPU point of view, although the support rate of WASM is higher than many new features of JavaScript, the support rate of SIMD instructions is still low. Although WASM has advantages over JavaScript, compared with JavaScript+WebGL, it cannot fully utilize the advantages of CPU, resulting in a waste of computing resources. WebGL is still a technology from more than ten years ago, and it is a technology of the same era as OpenGL ES, and does not support the operation of GPGPU. Although there is WebGL2 that supports the calculation pipeline, it cannot be used because Apple does not support this part of the function. And WebGPU can be seen from the caniuse website, desktop browsers have begun to support WebGPU in 2018-2020, and mobile browsers are also following up.
图片
So we have chosen two directions to optimize our on-device engine in 2021: WASM+Rust+SIMD and WebGPU. Although TensorFlow.js does not use Rust, it is also using WASM+SIMD.

WASM + RUST + SIMD

Get rid of Javascript and use the native language to write WASM, the first question is which language to choose. In 2021, Rust experimentally supports SIMD in Nightly, making it our best bet right now.

WebGL/WebGPU

Let's use a practical example to see how to run the algorithm model under the pure W3C standard with the help of the technical ecology of TensorFlow.js in browsing, and accelerate the model prediction through WebGL. In order to run the neural network in the browser and make it callable from JavaScript, the neural network needs to be stored as a file first.

 model.save('saved_model/w4model')

After saving to the directory, you will get two files: keras_metadata.pb and saved_model.pb and two directories: assets and variables. The tf_saved_model of TensorFlow is used here. This parameter will be used when converting the model to TensorFlow.js for use in the browser.
In order to make the model run in the browser, first install pip install tensorflowjs and use the tensorflowjs_converter command line conversion tool to convert the model:

 tensorflowjs_converter --input_format=tf_saved_model \
--output_node_names="w4model" \
--saved_model_tags=serve ./saved_model/w4model ./web_model

Here the --input_format parameter tf_saved_model corresponds to the method used for model storage before. It should be noted that the file formats and structures saved by different methods are incompatible with each other. --output_node_names is the name of the model, and the tag in --saved_model_tags is used to distinguish different MetaGraphDef, which is a parameter required to load the model. The default value is serve.
After model conversion through tensorflowjs_converter, we will see two files group1-shard1of1.bin and model.json in the web_model folder. Among these two files, the file ending with .json suffix is the model definition file, and the file ending with .bin suffix is the weight file of the model. You can think of the model definition file as a function of 4PL and polynomials, and the model weight file as the parameters of the function.
Initialize a node.js project through npm, and then add to the package.json configuration file:

 "dependencies": {
    "@tensorflow/tfjs": "^3.18.0",
    "@tensorflow/tfjs-converter": "^3.18.0"
  },

Here @tensorflow/tfjs is the runtime dependency provided by TensorFlow.js, and @tensorflow/tfjs-converter is the dependency needed to load our converted model. Next, add to the JavaScript program file of EntryPoint:

 import * as tf from "@tensorflow/tfjs";
import { loadGraphModel } from "@tensorflow/tfjs-converter";

Introduce the dependency into the program file. It should be noted here that loadGraphModel is imported from the @tensorflow/tfjs-converter dependency. Although @tensorflow/tfjs provides the tf.loadGraphModel() method to load the model, this method is only applicable to TensorFlow.js The model saved in , we save through the model.save() method in Python, and the model converted by Converter must be loaded with the loadGraphModel method provided in the tfjs-converter dependency package.
Then there is the complete code of the program:

 import * as tf from "@tensorflow/tfjs";
import { loadGraphModel } from "@tensorflow/tfjs-converter";
window.onload = async () => {
  const resultElement = document.getElementById("result");
  const MODEL_URL = "model.json";
•
  console.time("Loading of model");
  const model = await loadGraphModel(MODEL_URL);
  console.timeEnd("Loading of model");
•
  const test_data = tf.tensor([
    [0.0],
    [500.0],
    [1000.0],
    [1500.0],
    [2500.0],
    [6000.0],
    [8000.0],
    [10000.0],
    [12000.0],
  ]);
  tf.print(test_data);
  console.time("Loading of model");
  let outputs = model.execute(test_data);
  console.timeEnd("execute:");
  tf.print(outputs);
  resultElement.innerText = outputs.toString();
};

It should be noted here that since our model uses tensors as input when predicting, we need to use the tf.tensor() method to return the wrapped tensor as the input of the model. After running the program, we can see the debug information printed from the console of the browser developer tools:

 [Violation] 'load' handler took 340ms
index.js:12 Loading of model: 67.19482421875 ms
print.ts:34 Tensor
    [[0    ],
     [500  ],
     [1000 ],
     [1500 ],
     [2500 ],
     [6000 ],
     [8000 ],
     [10000],
     [12000]]
index.js:28 execute: 257.47607421875 ms
print.ts:34 Tensor
    [[-1.7345995 ],
     [90.0198822 ],
     [159.9183655],
     [210.0600586],
     [260.0179443],
     [347.4320068],
     [357.5788269],
     [367.5332947],
     [377.4856262]]

Here, it took 67ms to load the model and 257ms to execute the prediction, which is quite long.

 import * as tf from "@tensorflow/tfjs";
import { loadGraphModel } from "@tensorflow/tfjs-converter";
window.onload = async () => {
  // 新加入 webgl 硬件加速能力
  tf.setBackend("webgl");
  console.log(tf.getBackend());
  // 打印当前后端信息
  const resultElement = document.getElementById("result");
  const MODEL_URL = "model.json";

We will see that the console output webgl means that TensorFlow.js has successfully enabled the acceleration capability of WebGL. Under its acceleration, the speed of our prediction will be greatly increased from 257ms to 131ms. The multiple prediction time is due to the weight and calculation graph. Already loaded into video memory will be faster, reaching around 78ms.
Looking to the future, no matter how difficult it is, we will continue to promote the implementation of the WASM+SIMD/WebGPU engine and invest in supporting WebXR, especially AR. In fact, AR is essentially a machine learning problem. Facebook, which changed its name to Meta, is training models larger than GPT-3 to understand text, images, and speech in the context of AR. After the model is trained, how to perform reasoning on the mobile phone will be a major test for front-end intelligence, and it is also an important direction for our research and reserve technology.

Front-end Intelligent Technology Application

Application 1: Code Recommendation - sophon

In 2019, machine intelligence technology represented by deep learning in academia is making continuous breakthroughs, and excellent code recommendation services such as TabNine and aiXCoder have emerged in the engineering world, but they all rely on cloud services at first, which is not very important for code security. It is too friendly, so we made a version of the front-end code recommendation plug-in Sophon Code IntelliSense [10] based on intelligent practice and exploration, hoping to improve the research and development efficiency of front-end students through code completion. Throughout 2021, we continue to climb hard in the C2C direction. In addition to in-depth optimization of the model, this year also spent a lot of effort to improve the entire engineering infrastructure, code intelligence-related knowledge promotion, and track the latest developments in the industry.

Code Data Asset Management

图片
In addition to the continuous optimization of code recommendation capabilities, another focus is the construction of asset management in the field of code intelligence. All excellent deep models are inseparable from an accurate and standard data set, and such a data set will undoubtedly promote the continuous optimization of algorithm models while providing an "evaluation benchmark". As the research and development of Alibaba's big data and AI platform, we know how to obtain data, process data and manage data, so we took the initiative to take over the construction of the data asset management platform related to code intelligence. After a year of exploration and practice, a basically mature solution has been formed for data set management. details as follows:
图片

  • Based on Git OpenAPI, open source code is obtained. Here, warehouse code with more than 100 stars is often obtained, and after processing (excluding non-front-end code, deleting test code, and code block), the code file is uploaded to OSS for saving.
  • Use the script to obtain the OSS code file, and then import it into the ODPS table after structured processing. At the same time, basic DQC quality monitoring will be set for the ODPS table to ensure the availability of data.
  • Create a new SQL task, process data from the ODPS base table and then pour it into the ODPS table of the subtasks. Each subtable has different monitoring rules corresponding to each subtask.
  • The above process is cycled regularly, and corresponding versions are formed according to time for storage and management.
  • Based on the ODPS table data, the corresponding data service API can be generated for the user to use directly, or the data can be directly exported as a text file to facilitate model reading.

The above scheme builds an automated code data set management scheme, based on which the data set asset management related to code intelligence can be easily realized.
This year, we have also conducted continuous exploration in the field of code intelligence. Judging from the current overall development of the industry, we judge that the entire field of code intelligence recommendation will still be in a stage of slow rise for a long time in the future. to optimize the model. Secondly, continue to follow up the construction of the entire asset management platform, hoping to help more teams to build their own code recommendation and other services, and truly improve the front-end R&D efficiency and quality.

Application 2: Design draft generation code - imgcook

imgcook is an intelligent code generation platform for design drafts, which can generate maintainable front-end code (HTML, React, Vue, applet, Flutter, etc.) from design drafts (Sketch, Figma, PSD, pictures) with one click. imgcook has also experienced 4 years of development. Currently, 33,400+ users have uploaded 96,200+ pages on the imgcook platform, and a total of 70.07 million+ lines of code have been generated. The D2C capability of imgcook is used as a basic service and has been used by more than 10 people within the group. BU and department calls help front-end developers improve coding efficiency.
图片
In the past few years, imgcook has gone through several major development stages, including continuous enhancement of product capabilities, continuous exploration of intelligent technology applications, and some twists and turns. 3.0 upgrades and improves the technical system, and provides a one-stop R&D solution for Tianma module R&D. At the same time, it has been deeply cultivating in the direction of front-end intelligence, exploring intelligent implementation solutions in various D2C technologies, and improving the degree of intelligence. Some models have achieved closed-loop iteration to solve problems such as code semantics in D2C.
This stage focuses on D2C one-stop R&D and D2C intelligent capability layering capability improvement. Here is a demonstration process that combines one-stop R&D links and intelligent identification applications:
图片
imgcook 3.0 product demo

In the 3.0 stage, new intelligent solutions have been explored in each technical layer of D2C, including the layer analysis stage, material identification stage, layout identification stage, and logic generation stage, etc. For the D2C intelligent capability improvement plan and implementation situation, you can check the AI assists 90.4% of double 11 module code generation [11] series of articles.
图片
imgcook 3.0 technical framework

In the development process from 1.0 to 3.0, imgcook's productization capabilities, open capabilities, marketing link research and development solutions, and the exploration and implementation of intelligent directions are constantly iterating, but there are still user feedback: the design draft is not standardized, the generated code Unreasonable structure, redundant style, etc. What is the core question behind this? The D2C core process has only three steps, layer parsing, layout analysis, and DSL conversion to generate code.
图片

  • Layer parsing, Design to JSON, extracts the element information in Sketch, Figma, PSD, pictures and other types of design drafts, and converts it into an absolutely positioned JSON description without a hierarchical structure, mainly including:
    Node type, Div, Image, Text
    Style information, opacity, color, background, fontSize, border, etc.
  • Layout analysis, JSON to JSON, converts a JSON description without a hierarchical structure and absolute positioning into a JSON description with a hierarchical structure and relative positioning through layout analysis, mainly including:
    Parent-child node inclusion relationship, DOM hierarchy sibling node spacing, padding, margin and other layout styles, display, position, etc.
  • DSL conversion, JSON to Code, converts JSON descriptions with code structure and semantics into front-end code, mainly including:
    DSL type, React, Vue, Android, applet, etc.
    CSS type, style introduction methods such as less, css, scss, inline style, CSS class name and other style units, px, rem, vw, rpx

As an imgcook user, the most important thing is the result, which is the layout nesting structure and layout style of the generated code. If the generated layout structure is not available, the business logic generation and other intelligent recognition semantic enhancement capabilities will not be available. Supporting research and development No matter how perfect the link is, it is useless. If the manpower invested in imgcook is 100%, the investment in the layout nested structure and layout style should be at least 80%.

When using imgcook to generate code, you may get code that is not as expected, and there are some details problems, such as:

  • Lines, shadows, gradients, etc. that can be implemented with CSS are replaced by pictures in the generated code, resulting in a large number of unnecessary picture resources.
  • The generated layout has too many unnecessary absolute positioning, no reasonable use of Flex layout, etc.
  • The generated code style is redundant, does not conform to the code style of its own project, etc.

These are the issues that users are most concerned about and appear in various details of the code generation process. Therefore, in the past year, we have focused on core capabilities and user experience, starting from the style analysis in the design draft, cleaning up all the problems from Sketch to CSS, and decoupling the internal layers at the layout algorithm layer. And added "width and height setting strategy", "flex layout", "same layer grouping" and other functions, and added "error correction layer" to solve some of the noise problems caused by the design draft. In addition, the recognition accuracy of the existing mature models is improved, and it is applied to the process of design draft analysis and layout analysis.

Improved code accuracy

Layer parsing: parsing the style from the design draft <br>The work of layer parsing (Design to JSON) is to extract the element information in Sketch, Figma, PSD, pictures and other types of design drafts, and convert them into absolute, non-hierarchical structures. The JSON description of the positioning, mainly including:

  • Node type, Div, Image, Text
  • Style information, opacity, color, background, fontSize, border, etc.

However, there are many differences between the style description in Sketch and the style description in CSS. In addition, because the designer only pays attention to the visual design effect, the design method and operation are not limited, and some "irregular" layers will appear. The differences in each case need to be teased out and parsed out.
For example, designers use Sketch to design high-fidelity UI pages, and need to use a series of tools in Sketch to complete, such as adding different types of layers such as shapes, images, texts, etc. Add styles to the layer in the panel (Illustration Area 3).
In order to convert the layers designed by these various operations into CSS styles, it is necessary to sort out all the design operations of Sketch and parse them into CSS. If there are any missing and parsing errors, the generated code will have some details. The problem.
图片
Sketch design panel We have established the differences and analysis methods between the design styles in Sketch and CSS styles in the web, as well as some scenarios where there is no way to parse the design draft.
图片
Test cases for various design operations in Sketch

Layout analysis: Generate layout tree and layout style <br>The job of layout analysis (JSON to JSON) is to convert the JSON description without hierarchical structure and absolute positioning into JSON description with hierarchical structure and relative positioning through layout analysis, which mainly includes :

  • Parent-child node containment relationship, DOM hierarchy
  • Sibling node spacing, padding, margin, etc.
  • Layout style, display, position, etc.

The new version of the layout algorithm mainly solves these major problems:

  • Layout noise problems caused by irregular design Some design drafts contain redundant nodes without color, and nodes with a height or width of 0. These nodes are collectively referred to as "non-presentation elements"
  • Flex layout problem blank areas are filled with margins, but in some scenarios it is more reasonable to use space-between or padding
  • Text width and height setting problem

Code generation: DSL transformation generates code
The job of DSL conversion (JSON to Code) is to convert JSON descriptions with code structure and semantics into front-end code, including:

  • DSL type, React, Vue, Android, applet, etc.
  • CSS type, less, css, scss, etc.
  • Style introduction methods, inline styles, CSS class names, etc.
  • Style units, px, rem, vw, rpx

In the old version of the official DSL, only the conversion of various DSL types is supported, and there is no CSS type, style introduction method, style unit, and generation of public styles.
图片

Intelligent ability improvement

After a large number of intelligent solutions were explored in the previous 3.0 stage, several more mature solutions have been deposited, and models that have produced practical application value in D2C links. In the past year, it was mainly through iterative closed-loop link optimization. Training samples to improve the recognition accuracy of the model.
The recognition ability of the semantic deep learning model of text/image is enhanced, which is used to improve or improve background image/broken image merging and code semantics.
图片
The machine learning algorithm is introduced to identify whether the blank area in the design draft is "active" ("FreeSpace"), or "static" ("Margin/Padding"), which is used for layout style generation. Currently in beta version internal testing, experience Address: https://www.imgcook.com/editor#/?from=flexboxLayout

Product experience optimization

imgcook's product capabilities are relatively complete and comprehensive, involving a wide range of aspects, such as:

  • Design draft analysis includes: Sketch plug-in, PSD plug-in, Figma plug-in, Sketch file upload analysis, PSD file upload analysis, image file upload analysis
  • Collaboration management includes: workstation team/project/page management, team configuration
  • Developer services include: custom component library, custom editor, custom canvas, custom DSL, custom Plugin, custom model, custom business logic library, API interface service opening
  • Engineering efficiency includes: imgcook command line tool, VS CODE imgcook plugin
  • Officially maintained 10+ DSLs include: React, Rax, Vue, HTML5, WeChat applet, etc.
  • Other aspects such as: square case, editor

There are many functional modules, and there are many user experience problems. For example, the Sketch plug-in integrates code export and design resource management capabilities, resulting in front-end user feedback that most interfaces are not needed, and only supporting Github login is unfriendly to designers and slow in login speed. , The need to manually create a team project for the first time leads to a fault in the code generation link, troublesome retrieval when there are many modules, and diversified requirements such as self-adaptation when generating code, and separation of common styles.
In the past year, from the user's point of view, a comprehensive experience optimization has been carried out on the function modules that are frequently used. The core includes the following aspects:

  • The Sketch plug-in has been newly upgraded, and the interactive experience and resolution accuracy have been greatly improved;
  • Support diversified code output settings, for example, you can choose class name style, different conversion style units, style import methods, public style extraction, etc.;
  • The editor experience is newly upgraded, the interface is clearer, the interaction is smoother, and supports batch downloading of image resources;
  • Support DingTalk login and third-party account unbinding;
  • More effective information expression on the home page;
  • Workstation is more convenient and effective management interface, supports setting up common teams, filtering projects and modules by my favorites and created by me, etc.;
  • show some users

Business Application of imgcook: Smart UI

The personalized recommendation of products has brought a very significant increase to the business, but with the launch of a large number of products, the data information of the products has surged, and the personalized product recommendation still has a lot of product expression information that requires users to make decisions. The degree of "too many decisions" will lead to the loss of users.
Therefore, it is very necessary to shorten the user's "shopping decision-making time" through intelligent UI (personalization of UI), display the content that users care about most in a limited space, and bring more growth possibilities for the business. At present, our UI personalization strategy is: (by identifying the user's feature tags, dividing different identity portraits, designing different UIs, and displaying them to users individually)
By disassembling the UI card into the concepts of "layout" and "material", we can flexibly assemble the UI freely. Then, by designing different UI styles for different information, the information in the card is highlighted, so as to "guide" the user to notice the content he cares most about, and provide the user with a better UI experience.
图片
By disassembling the UI design into sub-modules, we can flexibly assemble push cards, adjust materials and layouts, and generate dozens or even hundreds of UI combinations. Then, by combining the UI recommendation ability of the algorithm, we can accurately match our users with the UI, thereby improving the card click rate.
The performance of smart UI is still very prominent in 2021. In 1688, Tao series and Maochao have all received very good business performance improvement. In the future, the smart UI will cover more shopping guide scenarios, and through the personalized UI expression of the smart UI, it will improve the business efficiency from the technical side. We have a very ambitious goal for intelligent UI. We hope to make intelligent UI an "internet-wide UI expression strategy" within Alibaba through the overall link upgrade and productization capability building, so as to shorten the decision-making time for users to screen information and improve the efficiency of finding goods. .
Facing the future, we will continue to work hard in the two directions of business access and effect improvement, and iterative upgrading of technical capabilities:

  • Business performance improvement:
    Comprehensive access to commodity materials: fully integrate Maochao commodity materials and access differentiated material algorithms based on different scenarios Granularity capability upgrade: deep mining of algorithms based on categories, people, and commodity dimensions
  • Full link capability upgrade. Improve access efficiency & further ensure link stability, build smart UI into basic technical capabilities, combine Wingpower background capabilities & Mao ultra-customization capabilities for intelligent UI productized access capability migration Link access environment isolation & comprehensive abnormal monitoring capabilities Assure

The new year has begun, and imgcook will continue to make efforts in three directions:

  • Business empowerment: In-depth integration of Alibaba's business scenarios and seamless integration into Alibaba's R&D process to form its own R&D solutions for each business to help improve business delivery efficiency.
  • Efficiency improvement in R&D: Focus on code generation availability, benchmark the code maintainability requirements of internal professional front-ends, and generate high-availability code.
  • Lay a solid foundation: Accumulate high-quality data sets, improve the accuracy of deep learning models, and help improve the usability of generated code.

Looking forward to the future: UI intelligence

In August 2019, I visited the Google headquarters in Mountain View, USA. This experience gave me some new understanding and understanding of intelligent UI, not only the intelligence of UI style and information expression, but also the penetration of UI and UI. Information to perceive and understand the user's intentions, and use intelligent means to assist users to achieve their intentions to achieve a real AI User Interface. By connecting users with the digital world and through time and space through the digital world and the physical world, the objects that users operate will actually change from applications to real-world services.
The service industry is different from industry and agriculture. The products it provides are services, which are intangible, non-storage, simultaneous and active. The "four characteristics" of the service industry determine that the service industry must take standardization as the premise and foundation. If there is no effective service standard, the service behavior cannot be digitized. If the service behavior cannot be digitized, it is difficult to use Internet technology to build a declarative operation interface for it. Without a declarative operation interface, the cost of technology research and development and connection will be high, and it is difficult to scale. Difficulty at scale does not provide users with sufficient options to help users achieve their intent.
If you investigate this issue in depth, you will find that the problems caused by the lack of standards are not only in the service industry. The standardization of content produces RSS subscription capabilities, the standardization of application calls produces end-calling capabilities, and iOS is also in the interoperability of applications. Set standards in terms of nature, and use Siri to complete the operation of the services provided behind the application, such as reading WeChat messages, opening health codes, etc. Therefore, the premise of realizing UI intelligence is to have service operation capabilities and program operation capabilities that are separated from users, and these capabilities need to be supported by standards to provide standardized openness to reduce R&D and access costs, so as to achieve scale to achieve User intent.
To optimize users' high-frequency operations for habitual interactive behaviors, there are generally two technical routes for applications, one is automation and the other is intelligence. Automation can be traced back to batch processing on Windows, automation provided by Fireworks and Photoshop, and macros in Excel and games, where the user orchestrates triggers, processes, and input and output through the ability of the application to open up. Similar automation capabilities on the mobile terminal, such as one-click video download on Douyin and Bilibili, especially with the help of Safari browser to share and open applications in the browser, can skillfully achieve some cumbersome but high-frequency user operations. The other is the intelligent technology route. Compared with server-side computing, on-side computing has the huge advantage of privacy protection, and can better identify and understand user behavior, so as to understand user habits. Training through various real-time data abstracts the patterns into "scenarios", and then according to the user's behavior in a specific scenario, the user's habitual high-frequency operations can be accurately judged and simplified.
You may think of a question: Even if the user's habitual high-frequency operations are simplified based on terminal intelligence, is it not much different from automation in essence? Yes, end intelligence is just intelligently generated and recommended automatic operations. Although there are intelligent components, it is still the same sentence: no matter how improved a candle can be, it cannot become a light bulb. What is the technical basis for simplifying users' habitual high-frequency operations based on terminal intelligence? It is still automation and depends on the limited open functions of the application. When encountering some "self" APPs that force users to open the application to use its functions, automation is helpless. I think in the future, we should think about how to replace users with a digital and intelligent assistant to do some tedious things, so as to make users more efficient when using the terminal.
In a structurally complex system, the complexity will not disappear but only transfer. Therefore, when faced with structurally complex system problems, we must find a way to transfer the complexity to a reasonably controllable area. For a user to use a mobile terminal, the same is structurally complicated. From mobile operating systems to mobile applications, due to commercial and market considerations, we can only make compromises around generality, rather than in-depth personalization for individuals. Although this experience can be extreme, the cost is too high. For example, during the outbreak of the epidemic in Gongshu District, I opened Toutiao every day, clicked on the anti-epidemic tab, swiped to Hangzhou epidemic situation and clicked, and then looked up the dynamics of the epidemic situation in Hangzhou. The hard days of eating instant noodles at home. In this dynamically changing world, our habitual high-frequency operations are constantly changing. If the assistant APP can understand the world, understand me, and understand the problems I face like a secretary, it will be able to face the problems I face. Really effective assistance in this structurally complex situation.
The utility of the assistant APP is proportional to its ability to understand, which is proportional to the ability of end intelligence, and the capacity of end intelligence is proportional to the ability of the model algorithm and the amount of user data used for training. The amount of user data available is proportional to user privacy. The ability to protect is proportional to the ability of the model algorithm and the computing power on the terminal. Therefore, the key to the utility of the assistant app is privacy protection and computing power on the terminal.
With the protection of privacy and computing power, how to build an assistant APP to assist users? Maslow's hierarchy of needs: physiological, safety, social needs, esteem, and self-actualization. To make a simple analogy, today obtaining information has become a part of physiological needs like eating and sleeping, and safety is a special and urgent part of information acquisition. Therefore, doing simple needs of physiology and safety is the first step to becoming a good assistant app and providing effective assistance. With the effective assistance of physiology and safety, it is very important that our assistant APP will gain the trust of the assisted object. With the foundation of trust, the assistant app reminds users to call their family members to get in touch when necessary, so as not to make them feel awkward, but it will make users feel more and more that the assistant app is warm and emotional, rather than cold. Robot. Starting from acquaintances (such as family members), with the development of strangers’ social interaction, respect and self-realization, the user’s local and personal information is becoming more and more insufficient. Assistant APP needs more external information input and external common sense, The understanding of knowledge can effectively assist users to solve these high-level complex needs. In any case, with the foundation of trust, users will naturally accept adding external information, attempts, and knowledge to the assistant APP just like adding a thesaurus to a dictionary application or an input method. (Transparency and security of these external data are still very important)
Of course, when solving the high-level and complex needs of users, the external input is only a part, and the other part is that the assistant APP will gradually be personified. Once we can design and develop an assistant APP like a competent private secretary, the user's terminal device will also undergo huge changes due to changes in interaction. Why does the interaction change? Because users don't have to deal with cold machines and UI anymore. Users will interact with their "private secretary" assistant APP, and the assistant APP is personified, so users can communicate with the assistant APP in a more natural language or even with eyes. In addition, since the assistant APP is essentially an application program, the way it interacts with the digital world will be more direct and efficient. There is no protocol to parse binary data into text, pictures, audio, video, etc., and interact directly with binary. Due to the direct and efficient interaction, the computing consumption of the user terminal device will be reduced. In addition to the need to parse and process the data, the process of rendering UI, listening to user input, processing user input, and responding to output is omitted. The miniaturization and wearability are huge benefits.
To sum up, UI intelligence will evolve in the direction of no UI, applications will evolve towards service-oriented and binary input and output, and mobile operating systems will evolve into only one APP - the user's private agent, and other APPs will be service-oriented And perform binary interaction with the agent, and the user will return to nature, society and life. In the future, you may be able to perform most of the functions on today's mobile phones with only glasses or headphones. As we realize user interaction, front-end engineers are constantly improving in intelligent capacity building and reserves. When new technological changes arrive, we are the main force in building UI intelligence.


大淘宝技术
631 声望3.3k 粉丝

大淘宝技术是阿里巴巴集团新零售技术的王牌军,支撑淘宝、天猫等核心电商业务。依托大淘宝丰富的业务形态和海量的用户,大淘宝技术部持续以技术驱动产品和商业创新,不断探索和衍生颠覆型互联网新技术,以更加智...