6.824 Notes1 (lec1 && lec2)

Introduction

reasons people build distributed system:

parallelism
fault tolerence
physical reasons
security / isolated

in 6.824, we'll focus on first two points;

challenges :

concurrency
partial failure
performance

Infrastructure consumes:

storage
communication
computation

Impl:

RPC, threads, concurrency control

Perfomance:

scalability (2x resoueces ----> 2x throughput) (careful about the design to actually get that perfomance)

fault tolerance:

availability
recoverability : non-volatile storage (raid, wal) && replication (sync problem, management problem, pretty complex);

Consistency:

update several copy(for fault tolerance), crash, get diffrence value;
weak consistency
strong consistency(expensive)

replica independent

MapReduce

input s---> a bunch of chunks --> call Map() for every input file -- > intermediate output: a list of key-value pairs.

-->collect all value corresponding to same key --- > call Reduce() for every key --> [key, totalVal] for each call.

The whole computation is call jobs.

Go

good support for thread, locking, sync, convenient RPC.

type safe, memory safe.

threads

in Go calls goroutines,

each thread has a stacks (in the same memory addr space);

I/O concurrency,

parallelism

convenience (use a goroutine(sleep 1 sec) and fires a check, whether works are still alive)

event-driven programming(async) single thread control (IO concurrency, but no cpu parallelism)

lock!!!

language doesn't know anything about the relationship between lock and variables !!! They function properly because programmers know which variable should be protected. the language itself doesn't know nothing about them. it's totally up to us programmers to decide what we whill use the lock to protect.

Cordination

channels.

sync.Cond (condition variable)

waitGroup

6.824 Notes1 (lec1 && lec2)

Introduction

MapReduce

Go

threads

lock!!!

Cordination

otto

引用和评论

最长公共子序列(LCS)