Jeff Dean: The Golden Decade of Deep Learning: Computing Systems and Applications

If ten years ago, you introduced face recognition, autonomous driving, and conversational robots to others, you might have been regarded as a lunatic. Today, however, with the development of AI technology, all this is gradually becoming true.
Even five years ago, when Youdao launched the Youdao Neural Network Translation Engine (YNMT), which made a qualitative leap in translation quality, people still had doubts about the quality of machine translation. But today, people are even discussing whether there will be a translation profession in the future.
The past decade has been a decade of rapid development of artificial intelligence and a decade of moving from the laboratory to the industrial world. I recently read a long article by Jeff Dean, a benchmark figure in the AI field and the head of Google's artificial intelligence, and I can't help but have some resonance and insight, and I am also looking forward to the next decade of continuing to devote myself to the development of AI technology.
We used Youdao Neural Network Translation Engine (YNMT) to translate the full text for readers. You can also feel the effect of current machine translation.
—— Duan Yitao, Chief Scientist of NetEase Youdao

Summary

Since the dawn of computers, humans have dreamed of creating "thinking machines." In 1956, John McCarthy organized a workshop at Dartmouth College, where a group of mathematicians and scientists gathered to "study how machines can use language, form abstract concepts, solve It is now only left to the various problems of humanity and to improve yourself." Workshop participants were optimistic that a few months of concentrated efforts would yield real progress on these issues.

The several-month timeline proved too optimistic. Over the next 50 years, various approaches to creating AI systems emerged, including logic-based systems, rule-based expert systems, and neural networks that encoded the logical rules of the world and used those rules that proved ineffective. Taking the Cyc project as the most prominent example, manually organizing millions of pieces of human knowledge into machine-readable form has proven to be a very labor-intensive endeavor, with no significant progress in enabling machines to learn autonomously from real Artificial neural networks, inspired by biological neural networks, seemed to be a promising approach during this time, but eventually fell out of favor in the 1990s. While they were able to produce impressive results on toy-scale problems, they were unable to produce interesting results on real-world problems at the time.

As an undergraduate student in 1990, I was fascinated by neural networks, felt that they seemed like the right abstraction to create intelligent machines, and believed that we just needed more computing power to enable larger neural networks to solve larger, more complex problems. Interesting question. I did an undergraduate thesis on parallel training of neural networks, believing that if we could train a neural network using 64 processors instead of just one, then neural networks could solve more interesting tasks. However, it turns out that relatively In 1990's computers, we would need about a million times more computing power, not 64 times, for neural networks to start making impressive progress on challenging problems!

However, starting around 2008, thanks to Moore's Law, we started to have such powerful computers, and neural networks began to revive and emerge as the most promising way to create computers that can see, hear, understand, and learn (and will This method was renamed "Deep Learning").

The decade from 2011 to the time of this writing (2021) has seen remarkable progress towards the goals set by the 1956 Dartmouth Symposium, with machine learning (ML) and artificial intelligence now gaining ground in many fields It has made tremendous advances, created opportunities for new computing experiences and interactions, and greatly expanded the set of problems the world can solve.

This paper focuses on three areas : the computing hardware and software systems that are driving this progress; some exciting application examples of machine learning over the past decade; and how we can create more powerful machine learning systems to truly enable the creation of intelligent machines Target.

1. AI hardware and software

Unlike general-purpose computer code (like the software you might use every day when you run a word processor or web browser), deep learning algorithms are usually built in different ways that compose a small number of linear algebra operations: matrix multiplication, vector dot product and similar operations. Because of this limited vocabulary of operations, it is possible to build computers or accelerator chips that specifically support this type of computation. This specialization enables new efficiencies and design choices relative to general-purpose central processing units (CPUs), which must run a wider variety of algorithms.

In the early 2000s, some researchers began to study the use of graphics processing units (GPUs) to implement deep learning algorithms. Although these devices were originally designed for drawing graphics, the researchers found that they were also well suited for deep learning algorithms because of their relatively high floating-point computation rates compared to CPUs. In 2004, computer scientists Kyoung-Su Oh and Keechul Jung demonstrated a nearly 20-fold improvement in neural network algorithms using GPUs. In 2008, computer scientist Rajat Raina and colleagues showed that using GPUs was 72.6 times faster than the best CPU implementations of some unsupervised learning algorithms.

These early achievements continued to build as neural networks trained on GPUs outperformed other methods in a wide variety of computer vision competitions. The field of machine learning has really started to take off as deep learning methods have seen dramatic improvements in image recognition, speech recognition, and language understanding, and more and more computationally-intensive models (trained on larger datasets) continue to show improved results take off. Computer system architects began to investigate how to scale deep learning models to more computationally intensive places. An early approach used a large-scale distributed system to train a single deep learning model. Google researchers developed the DistBelief framework, a software system capable of training a single neural network using a large-scale distributed system. Using DistBelief, the researchers were able to train a single unsupervised neural network model that was two orders of magnitude larger than previous neural networks. The model was trained on a large collection of random frames from YouTube videos, and with a large network and enough computation and training data, it demonstrated that a single artificial neuron in the model (the building block of a neural network) could learn to recognize images like High-level concepts like a human face or a cat, though never give any information about these concepts other than the pixels of the original image.

These successes have prompted system designers to design computing devices that are better suited and matched to the needs of deep learning algorithms than GPUs. To build specialized hardware, deep learning algorithms have two very nice properties. First, they are very forgiving of reduced precision. Unlike many numerical algorithms that require a 32-bit or 64-bit floating-point representation to guarantee numerical stability of the computation, deep learning algorithms use a 16-bit floating-point representation during training (the process by which a neural network learns from observations), and during inference Using 8-bit or even 4-bit integer fixed-point representations (the process by which a neural network generates predictions or other outputs from inputs) is usually fine. Using lower-precision multipliers can place more multipliers in the same chip area, which means the chip can perform more calculations per second than using higher-precision multipliers. Second, the computations required by deep learning algorithms consist almost entirely of different sequences of linear algebra operations on dense matrices or vectors, such as matrix multiplication or vector dot product. This allows us to see that making chips and systems dedicated to low-precision linear algebra computations can bring great benefits in performance per dollar and per watt. An early chip in this regard was Google's first Tensor Processing Unit (TPUv1), which targeted 8-bit integer computations for deep learning inference, delivering an order of magnitude improvement in speed and performance over contemporary CPUs and GPUs. These The deployment of the chip has allowed Google to achieve significant improvements in speech recognition accuracy, language translation and image classification systems. Later TPU systems consisted of custom chips and larger-scale systems connected to pods (large-scale supercomputers used to train deep learning models) via high-speed custom networks. GPU manufacturers like NVIDIA started out for lower Accurate deep learning computations tailored later designs, and venture capital-funded startups came after a rain to build a variety of deep learning accelerator chips, of which GraphCore, Cerebras, SambaNova, and Nervana were some of the best known.

With the rise of GPUs and other ML-oriented hardware, researchers have developed open-source software frameworks that make it easy to express deep learning models and computations. These software frameworks remain key enablers. Today, open source frameworks help a wide range of researchers, engineers, and others advance deep learning research and apply deep learning to a very wide range of problem domains (many of which are discussed below). Some of the earliest frameworks, such as Torch, developed in 2003, drew inspiration from earlier mathematical tools such as MatLab and NumPy Theano, developed in 2010, was an early framework for deep learning, including automatic symbolic differentiation and automatic differentiation is a useful tool that greatly simplifies the expression of many gradient-based machine learning algorithms, such as stochastic gradient descent (a method that works by comparing actual and expected outputs and making small adjustments to model parameters in the direction of the error gradient). techniques for correcting errors in output). DistBelief and Caffe are frameworks developed in the early 2010s with an emphasis on scale and performance.

TensorFlow is a framework that allows expressing machine learning computations It was developed and open-sourced by Google in 2015, and incorporating ideas from earlier frameworks such as Theano and DistBelief, TensorFlow is designed for a wide variety of systems, allowing ML computations to run On desktop computers, mobile phones, large-scale distributed environments in data centers, and web browsers, and targeting a wide variety of computing devices, including CPUs, GPUs, and TPUs. The system has been downloaded more than 50 million times and is one of the most popular open source software packages in the world. It enables massive use of machine learning by individuals and organizations large and small around the world.

Released in 2018, JAX is a popular open-source library for python that combines sophisticated automatic differentiation with an underlying XLA compiler, and is also used by TensorFlow to efficiently map machine learning computations onto a variety of different types of hardware.

The importance of open source machine learning libraries and tools like Tensor-Flow and PyTorch cannot be overstated. They allow researchers to quickly try ideas on these frameworks and express them. As researchers and engineers around the world more easily build on each other's work, progress across the field has accelerated!

2. Research explosion

With the advancement of research, the increasing computing power of ML-oriented hardware such as GPUs and TPUs, and the widespread adoption of open source machine learning tools such as Tensor-Flow and PyTorch, there has been a huge increase in research output in machine learning and its applications. . A strong indicator is the number of papers published to machine learning-related categories on arXiv, a popular paper preprint hosting service, with more than 32 times the number of paper preprints published in 2018 (doubling every two years) More) Now, more than 100 machine learning-related research papers are published on arXiv every day, and this growth shows no signs of slowing down.

3. App Explosion

The transformative growth in computing power, advances in machine learning software and hardware systems, and the proliferation of machine learning research have all led to a proliferation of machine learning applications in many fields of science and engineering. By collaborating with experts in key fields such as climate science and healthcare, machine learning researchers are helping to solve important problems that benefit society and advance human progress. We are indeed living in exciting times.

Neuroscience is an important field where machine learning accelerates scientific progress. In 2020, researchers studied the brain of a fly to learn more about how the human brain works. They built a connectome, a map of the entire fly brain at the level of synaptic resolution, but without machine learning and the computing power we have now, this would have taken many years. For example, in the 1970s, researchers spent about a decade painstakingly mapping the roughly 300 neurons in a worm's brain. By comparison, the fly brain has 100,000 neurons, and the mouse brain (the next target for machine learning-assisted connectomics) has about 70 million neurons. The human brain contains about 85 billion neurons, each with about 1,000 connections. Fortunately, advances in deep learning-based computer vision technology can now speed up this previously gargantuan process. Today, thanks to machine learning, you can explore the brain of a fly yourself using interactive 3D models!

3.1 Molecular Biology

Machine learning can also help us learn more about our genetic makeup and ultimately address gene-based diseases more effectively. These new technologies allow scientists to more quickly explore the promise of potential experiments through more precise simulations, estimates and data analysis. An open-source tool called DeepVariant can more accurately process raw information from a DNA sequencing machine (which contains errors introduced by the physical process of reading a gene sequence) and analyze it through a convolutional neural network, relative to a reference genome data to more accurately identify true genetic variants in sequences. Once genetic variants have been identified, deep learning can also help analyze genetic sequences to better understand the genetic signatures of single or multiple DNA mutations that lead to specific health or other outcomes. For example, a study led by the Dana-Farber Cancer Institute increased the diagnosis rate of genetic variants that cause prostate cancer and melanoma by 14% in a cohort of 2,367 cancer patients.

3.2 Healthcare

Machine learning also offers new ways to help detect and diagnose disease. For example, when applied to medical images, computer vision can help doctors diagnose some serious diseases faster and more accurately than doctors can diagnose on their own.

An impressive example is the ability of deep neural networks to correctly diagnose diabetic retinopathy, often on par with human ophthalmologists. This eye disease is the fastest growing cause of preventable blindness (expected to affect 642 million people by 2040).

Deep learning systems can also help detect lung cancer as well or better as trained radiologists. The same is true for breast cancer, skin diseases, and other diseases. The application of sequential predictions to medical records can help clinicians determine the likely diagnosis and risk level of chronic diseases.

Today’s deep learning technologies are also giving us a more accurate picture of how diseases spread, giving us a better chance of preventing them. Machine learning helps us model complex events such as the global COVID-19 pandemic, which requires comprehensive epidemiological datasets, development of new interpretable models and agent-based simulators to inform public health responses.

3.3 Weather, environment and climate change

Climate change is one of the greatest challenges facing humanity today. Machine learning can help us better understand weather and the environment, especially when it comes to predicting everyday weather and climate hazards.

When it comes to weather and precipitation forecasting, computationally intensive physics-based models, such as NOAA's High Resolution Rapid Refresh (HRRR), have long dominated, while machine learning-based forecasting systems have Prediction on the time scale is more accurate than HRRR, with better spatial resolution and faster prediction computation.

For flood forecasting, neural networks can model river systems around the world (a technique known as HydroNets), resulting in more accurate water-level forecasts. Using this technology, for example, authorities can send information to India and Bangladesh more quickly. More than 200 million people have issued flood warnings.

Machine learning can also help us better analyze satellite imagery. We can quickly assess damage following natural disasters (even with limited prior satellite imagery), understand the impact and extent of wildfires, and improve ecological and wildlife monitoring.

3.4 Robotics

The physical world is chaotic, full of unexpected obstacles, sliding and breaking. This makes it quite challenging to create robots that can successfully operate in cluttered real-world environments like kitchens, offices, and roads (industrial robots have already made a major impact on the world, operating in more controlled environments like factory assembly lines) ). To code or program real-world physical tasks, researchers need to predict all possible situations a robot might encounter. Machine learning effectively trains robots to operate effectively in real-world environments by combining techniques such as reinforcement learning, human demonstration, and natural language teaching. Machine learning also provides a more flexible and adaptable way in which robots can learn the best way to perform grasping or walking tasks, rather than being locked into hard-coded assumptions.

Some interesting research techniques include automatic reinforcement learning combined with remote robot navigation, teaching robots to follow natural language instructions (multiple languages!), and applying a zero-shot imitation learning framework to help robots better navigate simulated and real environments.

3.5 Availability

It's easy to take for granted seeing beautiful pictures, hearing a favorite song, or talking to a loved one. However, more than a billion people do not have access to the world in these ways. Machine learning improves accessibility by transforming these signals (visual, auditory, speech) into other signals that people with accessibility needs can manage well, allowing people to better engage with the world around them. Some examples of applications include speech-to-text transcription, real-time transcription when someone is engaged in a conversation, and applications that help visually impaired users identify their surroundings.

3.6 Teaching students in accordance with their aptitude

Machine learning can also be used to create tools and applications to help personalize learning. This will have far-reaching benefits, with initial examples including early childhood reading guides such as Google Read Along (formerly Bolo), which is helping children around the world learn to read in a variety of different languages, and machine learning tools such as Socra At the end, it can be supported by speech recognition, realistic speech output and language understanding by giving them intuitive explanations and more detailed information about the concepts they are trying to learn, in a variety of subjects such as mathematics, chemistry, and literature of personalized learning has the potential to improve educational outcomes around the world.

3.7 Computer-Assisted Creativity

Deep learning algorithms have shown an amazing ability to transform images in complex and creative ways, allowing us to easily create a Monet-style spaceship or an Edvard Munch-style Golden Gate Bridge. Through an art style transfer algorithm (developed by machine learning researcher Leon Gatys and colleagues), the neural network can take a real-world image and an image of a painting and automatically render the real-world image in the painter's style.

OpenAI's DALL·E lets users describe images using text ("an armchair in the shape of an avocado" or "a loft bedroom with a white bed next to the bedside table and a fish tank beside the bed") and generate a natural language description that expresses The properties of images provide artists and other creators with sophisticated tools to quickly create the images they have in mind.

Machine learning-powered tools are also helping musicians create in unprecedented ways. Beyond "tech," new uses for these computations can help anyone create new and unique sounds, rhythms, melodies, or even a whole new kind of musical instrument.

It's not hard to imagine a future where tools can interactively help people create amazing representations of our mental imagery - "Draw me a beach...no, I hope it's night...the full moon...and a mother giraffe and a baby... Get out of the water next to a surfer" - by interacting with our computer assistant.

3.8 Important Components

Federated learning is a powerful machine learning approach that preserves user privacy while leveraging many different clients (such as mobile devices or organizations) to collaboratively train a model while keeping the training data decentralized. methods with superior privacy properties become possible.

Researchers continue to advance the state of the art in federated learning by developing adaptive learning algorithms, techniques to mimic centralized algorithms in federated settings, substantial improvements to complementary cryptographic graph protocols, and more.

3.9Transformer

Since the inception of the field of artificial intelligence, language has been at the heart of the field's development, as the use and understanding of language is ubiquitous in our daily lives. Because language involves symbols, it was a natural push for AI to take a symbolic approach at first. But over the years, AI researchers have come to realize that more statistical or pattern-based approaches can yield better practical uses. The right type of deep learning can efficiently represent and manipulate the hierarchical structure of language for a variety of real-world tasks, from translation between languages to image labeling. Much of the work in this area at Google and elsewhere now relies on Transformers, a special style of neural network models originally developed for language problems (but there is growing evidence that they can also be used for images , video, speech, protein folding, and various other fields).

There have been several interesting examples of using deformers in scientific settings, such as training protein sequences to find representations encoding meaningful biological properties, generating proteins through language modeling, and bio-BERT for text mining in biomedical data ( using pre-trained models and training codes), embedding scientific texts (using codes), and medical question answering. Computer scientists Maithra Raghu and Eric Schmidt provide a comprehensive review of deep learning approaches for scientific discovery.

3.10 Machine Learning for Computer Systems

The researchers also applied machine learning to problems in core computer science and the computer systems themselves. This is an exciting virtuous circle for machine learning and computing infrastructure research, as it can accelerate all the techniques we apply to other fields. In fact, this trend is spawning entirely new conferences such as MLSys where learning-based approaches are even being applied to database indexing, learning sorting algorithms, compiler optimizations, graph optimizations, and memory allocation.

4. The future of machine learning

There are some interesting research threads emerging in the ML research community that might be even more interesting if they were combined.

First, work on sparse activation models, such as sparse-gated mixed-expert models, shows how to build very large-capacity models in which, for any given example, only a part of the model is "activated" (say, 2048 experts There are only 2 or 3 experts in these models) The routing functions in these models are jointly trained with different experts at the same time, so that the routing functions learn which experts are good at which kinds of examples, while the experts simultaneously learn to specialize on the features of the given example stream . This is in stark contrast to most ML models today, where each instance activates the entire model. Research scientist Ashish Vaswani and colleagues show that this approach is about 9 times more efficient at training and about 2.5 times more efficient at inference, and is more accurate (+1 BLEU point, for language translation tasks, This is a relatively large accuracy improvement).

Second, work in automated machine learning (AutoML), where techniques such as neural architecture search or evolutionary architecture search can automatically learn effective structures and other aspects of machine learning models or components to optimize accuracy for a given task, typically involves Run many automated experiments, each of which can involve a lot of computation.

Third, multi-task training at a modest scale of a few to dozens of related tasks, or transfer learning from models trained on a large amount of data on one related task, and then fine-tuned on a small amount of data on a new task, has been shown to Very effective for a wide variety of problems So far, most applications of multi-task machine learning are usually in the context of a single modality (such as all vision tasks or all text tasks), although a few authors have also considered Multimodal settings.

A particularly interesting research direction combining these three trends is running a system on large-scale ML accelerator hardware with the goal of training a single model that can perform thousands or millions of tasks. Such a model may be composed of many different components in different structures, and the flow of data between instances is relatively dynamic on an instance-by-instance basis. The model might use techniques like sparse gating mixed experts and learned routing to have a very large capacity model, but in this model a given task or example only sparsely activates a small fraction of the total components in the system part (thus keeping the computational cost and power consumption per training example or inference much lower). An interesting direction to explore is to use dynamic and adaptive computations for different examples, such that "easy" examples use much less computation than "hard" ones (this is a problem in today's machine learning models). relatively unusual feature). Figure 1 depicts such a system.

Each component itself may be running some automl-like architecture search to adapt the structure of the component to the type of data routed to that component. If useful, new tasks can leverage components trained on other tasks. The hope is that with very large-scale multi-task learning, shared components, and learned routing, the model can learn very quickly to complete new tasks with high accuracy, with relatively few examples of each new task (as the model is able to use it to complete other related tasks) expertise and internal representations that have been developed at the time of the task).

In the fields of artificial intelligence and computer systems engineering, building a single machine learning system that can handle millions of tasks and learn to successfully complete new tasks automatically is a truly grand challenge. It will require expertise and advancements in many areas, including machine learning algorithms, responsible AI topics such as fairness and interpretability, distributed systems and computer architecture, so that by building an Systems that solve new tasks to advance the field of artificial intelligence.

4.1 Responsible AI Development

While AI has the power to help us in many aspects of our lives, all researchers and practitioners should ensure that these methods are developed responsibly - carefully scrutinizing bias, fairness, privacy and other societal considerations, these tools may How to behave and influence others, and work to appropriately address these considerations.

It is also important to have a clear set of principles to guide responsible development. In 2018, Google released a set of AI principles that guide the company's work and use of AI. The AI Principles list important areas of consideration, including issues of bias, security, fairness, accountability, transparency, and privacy in machine learning systems. In recent years, other organizations and governments have followed this pattern, publishing their own principles on the use of AI. It's great to see more organizations publishing their own guidelines, and I hope this trend will continue until it's no longer a trend, but a standard for all machine learning research and development.

5. Summary

The 2010s were a golden decade for deep learning research and progress. During this decade, the field has made great strides in some of the most difficult problem areas presented at the symposium that created the field of artificial intelligence in 1956. Machines can see, hear and understand language in the way early researchers hoped.

Success in these core areas has led to tremendous advances in many areas of science, made our smartphones smarter, and opened our eyes to what's possible in the future as we continue to create more complex and powerful deep learning models to help us in our daily lives. With the help of unparalleled machine learning systems, our future will become more creative and capable. I can't wait to see what the future holds!

Author's Note:
Alison Carroll, Heather Struntz, and Phyllis Bendell helped edit this manuscript and provided many helpful suggestions on how to present much of the material.
© 2022 Courtesy of Jeffrey Dean. Released under the CC BY-NC 4.0 license.

Jeff Dean: The Golden Decade of Deep Learning: Computing Systems and Applications

Summary

1. AI hardware and software

2. Research explosion

3. App Explosion

3.1 Molecular Biology

3.2 Healthcare

3.3 Weather, environment and climate change

3.4 Robotics

3.5 Availability

3.6 Teaching students in accordance with their aptitude

3.7 Computer-Assisted Creativity

3.8 Important Components

3.9Transformer

3.10 Machine Learning for Computer Systems

4. The future of machine learning

4.1 Responsible AI Development

5. Summary

有道AI情报局

引用和评论

速来体验！基于有道子曰的翻译大模型2.0正式上线

Open WebUI：开源AI交互平台的全面解析

大模型中的Token究竟是什么？从原理到作用深度解析

MySQL × 向量数据库：大模型时代的黄金组合实战指南

AdventureX 2025 正式启动：五天四夜，120小时极限创造！一起在杭州点燃青年创新之火！

【万字长文】大模型开源开发全景与趋势解读

大模型时代，后端程序员如何避免被AI卷死？