The fun and benefits of distributed systems

Original: http://book.mixu.net/distsys/intro.html
Translation: Zhu Kunrong

1. Look at distributed systems at a high level

Distributed programming is the art of using multiple computers to accomplish the same problems as a single computer

Any computer system needs to handle the following two basic tasks:

storage
calculate

Distributed programming is the art of using multiple computers to accomplish the same problems as a single computer-usually this is because this problem is no longer suitable for processing on a single computer.

Nothing is really needed for distributed systems. To give you unlimited money and unlimited development time, we don't need a distributed system. All calculations and storage can be done in a magic box-you can spend money to ask someone to set up a single point, extremely fast and extremely reliable system for you.

Despite this, only a handful of people have unlimited resources. Therefore, it is necessary to find a balance between real world costs and benefits. At a very small scale, upgrading hardware is a straightforward strategy. However, when the problem domain becomes larger, you will reach a point where hardware upgrades can no longer help you solve a problem that a single node can solve, or the cost of solving this problem is extremely high. At that point, you are welcome to the world of distributed systems.

The current reality is that only commercial hardware in mid-range configurations is the most valuable-maintenance costs can be reduced by adopting fault-tolerant software.

The main computing benefit of high-end hardware comes from the fact that they can use internal memory access instead of slow network access. The performance advantages of high-end hardware are very limited in tasks that require a large amount of communication between nodes.

http://book.mixu.net/distsys/images/barroso_holzle.png

The above figure is quoted from Barroso, Clidaras & Hölzle's https://www.morganclaypool.com/doi/abs/10.2200/S00516ED2V01Y201306CAC024 , when it is assumed that all nodes use a unified memory access mode, high-end hardware and common commercial hardware The performance gap will become smaller.

Generally speaking, adding new machines will linearly increase the performance and capacity of the system. But in practice this is impossible, because in practice it depends on these independent computers. Data needs to be copied, computing tasks need to be coordinated, and so on. This is also worth learning about distributed algorithms-they provide effective solutions to specific problems, guide what is feasible, how much a correct implementation can cost the least, and what is impossible.

This article focuses on distributed programming, and the system is secular but commercially common: the data center. For example, I will not discuss a specific problem with a specific network configuration, or a problem in the area of shared memory. In addition, the focus will be on the field of system design rather than optimizing a specific design-the latter is a more field-specific problem.

What we want to achieve: scalability and other goodies

I can see that everything starts with processing scale.

Most things are simple when the scale is small-and the same problem becomes difficult when a certain scale is reached, from traffic or other physical limitations. It's easy to lift a piece of chocolate, but it's hard to lift a mountain. Counting how many people there are in a house is easy, but counting how many people there are in a country is difficult.

So things start with scale-scalability. In formal terms, in a scalable system, when we grow from small to large, things should not get worse with growth. This is another definition:

(16106b7014e9ac https://en.wikipedia.org/wiki/Scalability) is a system, network or processor's ability to handle the increasing workload in a reliable way or to adapt itself to larger This growth.

So what is growth? In fact, you can calculate growth in any way (number of people, electricity used, etc.). But there are mainly the following three to be concerned:

Number scaling: adding more nodes can make the system grow linearly; the growth of the data set should not lead to the growth of the delay
Geographical scalability: In theory, multiple data centers can be used to reduce the response time for user queries, and the delay between multiple data centers can be handled in a reasonable manner.
Administrator scaling: Adding more nodes should not increase the cost of the administrator's management system (for example: the ratio of the administrator to the amount of machines)

Of course, growth in the real world occurs in many different dimensions at the same time; each indicator captures only some of the growth.

A scalable system can continue to meet demand as the scale of users grows. There are two specific related areas-performance and usability-which can also be measured in a variety of ways.

This article is from Zhu Kunrong's WeChat public account "Malt Bread", the public account id "darkjune_think"

Developer/Science Fiction Enthusiast/Hardcore Host Player/Amateur Translator
Please specify if reprinted.

Weibo: Zhu Kunrong
Station B: https://space.bilibili.com/23185593/

Communication Email: zhukunrong@yeah.net

The fun and benefits of distributed systems

1. Look at distributed systems at a high level

What we want to achieve: scalability and other goodies

祝坤荣

引用和评论

Street coder 1.4.1 -1.4.2

融合AMD与NVIDIA GPU集群的MLOps：异构计算环境中的分布式训练架构实践

【微服务架构】从链路追踪到日志关联：打造分布式系统问题定位利器

CAP 理论：分布式系统的三选二原则与 Java 实战

Paxos 协议三阶段解密：原理剖析与 Java 实现