Currently there are 3 modes, Standalone, Mesos, YARN
Install
First, place Spark on each machine, to do this, can use Shell script, Docker
to use Shell script,
to launch manually, start the master server by ./sbin/start-master.sh
Once started, the master will print out a spark://HOST:PORT URL
for itself, workers can connect to this URL
For each worker, connect to master through command ./sbin/start-slave.sh <master-spark-URL>
We can go to the master web UI http://localhost:8080 to check current status
Build with Script
go to this link:
Start Application
pass the spark://IP:PORT
URL of the master to SparkContext()
constructor
conf = SparkConf().setAppName(appName).setMaster(masterURL)
sc = SparkContext(conf=conf)
alternatively, launching with spark-submit is recommended
Launching with spark-submit
After the cluster is installed, the app can be submitted to the cluster,
we can use spark-submit script to submit, general format is
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
Simplest Example
This example shows how to launch a program by spark-submit
# Run a Python application on a Spark standalone cluster
./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000
# pi.py
if __name__ == "__main__":
spark = SparkSession.builder.appName("PythonPi").getOrCreate()
count = spark.sparkContext.parallelize(range(1, n + 1)).map(f).reduce(add)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。