Introduction
reasons people build distributed system:
- parallelism
- fault tolerence
- physical reasons
- security / isolated
in 6.824, we'll focus on first two points;
challenges :
- concurrency
- partial failure
- performance
Infrastructure consumes:
- storage
- communication
- computation
Impl:
- RPC, threads, concurrency control
Perfomance:
- scalability (2x resoueces ----> 2x throughput) (careful about the design to actually get that perfomance)
fault tolerance:
- availability
- recoverability : non-volatile storage (raid, wal) && replication (sync problem, management problem, pretty complex);
Consistency:
- update several copy(for fault tolerance), crash, get diffrence value;
- weak consistency
- strong consistency(expensive)
replica independent
MapReduce
input s---> a bunch of chunks --> call Map() for every input file -- > intermediate output: a list of key-value pairs.
-->collect all value corresponding to same key --- > call Reduce() for every key --> [key, totalVal] for each call.
The whole computation is call jobs.
Go
good support for thread, locking, sync, convenient RPC.
type safe, memory safe.
GC
threads
in Go calls goroutines,
each thread has a stacks (in the same memory addr space);
I/O concurrency,
parallelism
convenience (use a goroutine(sleep 1 sec) and fires a check, whether works are still alive)
event-driven programming(async) single thread control (IO concurrency, but no cpu parallelism)
lock!!!
language doesn't know anything about the relationship between lock and variables !!! They function properly because programmers know which variable should be protected. the language itself doesn't know nothing about them. it's totally up to us programmers to decide what we whill use the lock to protect.
Cordination
channels.
sync.Cond (condition variable)
waitGroup
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。