Currently there are 3 modes, Standalone, Mesos, YARN

Install

First, place Spark on each machine, to do this, can use Shell script, Docker
to use Shell script,

to launch manually, start the master server by ./sbin/start-master.sh
Once started, the master will print out a spark://HOST:PORT URL for itself, workers can connect to this URL
For each worker, connect to master through command ./sbin/start-slave.sh <master-spark-URL>

We can go to the master web UI http://localhost:8080 to check current status

Build with Script

go to this link:

Start Application

pass the spark://IP:PORT URL of the master to SparkContext() constructor

conf = SparkConf().setAppName(appName).setMaster(masterURL)
sc = SparkContext(conf=conf)

alternatively, launching with spark-submit is recommended

Launching with spark-submit

After the cluster is installed, the app can be submitted to the cluster,
we can use spark-submit script to submit, general format is

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

Simplest Example

This example shows how to launch a program by spark-submit

# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000

# pi.py  
if __name__ == "__main__":    
    spark = SparkSession.builder.appName("PythonPi").getOrCreate()
    count = spark.sparkContext.parallelize(range(1, n + 1)).map(f).reduce(add)

Lycheeee
0 声望1 粉丝