- Storm Client
最开始使用storm命令(bin/storm)来启动topology, 如下
storm jar storm-starter.jar storm.starter.WordCountTopology
这个storm命令是用python实现的, 看看其中的jar函数, 很简单, 调用exec_storm_class, 其中jvmtype="-client"
而exec_storm_class其实就是拼出一条java执行命令, 然后用os.system(command)去执行, 为何用Python写, 简单? 可以直接使用storm命令?
这儿的klass就是topology类, 所以java命令只是调用Topology类的main函数
/bin/storm
def exec_storm_class(klass, jvmtype="-server", jvmopts=[], extrajars=[], args=[], fork=False):
global CONFFILE
all_args = [
"java", jvmtype, get_config_opts(),
"-Dstorm.home=" + STORM_DIR,
"-Djava.library.path=" + confvalue("java.library.path", extrajars),
"-Dstorm.conf.file=" + CONFFILE,
"-cp", get_classpath(extrajars),
] + jvmopts + [klass] + list(args)
print "Running: " + " ".join(all_args)
if fork:
os.spawnvp(os.P_WAIT, "java", all_args)
else:
os.execvp("java", all_args) # replaces the current process and never returns
def jar(jarfile, klass, *args):
"""Syntax: [storm jar topology-jar-path class ...]
Runs the main method of class with the specified arguments. 运行类的main方法并且跟上指定的参数
The storm jars and configs in ~/.storm are put on the classpath. Storm运行时的jar包和配置文件会被加载到类路径中
The process is configured so that StormSubmitter 当计算拓扑被提交后, jar包会被上传
will upload the jar at topology-jar-path when the topology is submitted.
"""
exec_storm_class(
klass,
jvmtype="-client",
extrajars=[jarfile, USER_CONF_DIR, STORM_DIR + "/bin"],
args=args,
jvmopts=[' '.join(filter(None, [JAR_JVM_OPTS, "-Dstorm.jar=" + jarfile]))])
执行上面的storm jar命令: storm jar storm-starter.jar storm.starter.WordCountTopology
jarfile = storm-starter.jar, klass = storm.starter.WordCountTopology, -Dstorm.jar = storm-starter.jar
直接看看WordCountTopology例子的main函数都执行什么?
除了定义topology, 最终会调用StormSubmitter.submitTopology(args[0], conf, builder.createTopology()), 来提交topology
storm-starter/storm/starter/WordCountTopology.java
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
} else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown();
}
}
- StormSubmitter
1) 配置参数
把命令行参数放在stormConf, 从conf/storm.yaml读取配置参数到conf, 再把stormConf也put到conf, 可见命令行参数的优先级更高
将stormConf转化为Json, 因为这个配置是要发送到服务器的
RPC分为客户端和服务端, 在客户端要持有服务端的接口引用. 调用这个引用就会调用到服务端该接口的具体实现类的同名方法.
所以客户端代码使用Nimbus.Iface localNimbus引用, 相当于持有了一个服务器Nimbus的接口. 只要调用这个接口的方法即可.
storm-core/backtype/storm/utils/NimbusClient.java (注: .java文件在storm-core/src/jvm下)
public class NimbusClient extends ThriftClient {
private Nimbus.Client _client;
public static NimbusClient getConfiguredClient(Map conf) {
String nimbusHost = (String) conf.get(Config.NIMBUS_HOST);
int nimbusPort = Utils.getInt(conf.get(Config.NIMBUS_THRIFT_PORT));
return new NimbusClient(conf, nimbusHost, nimbusPort);
}
public NimbusClient(Map conf, String host, int port) throws TTransportException {
this(conf, host, port, null);
}
public NimbusClient(Map conf, String host, int port, Integer timeout) throws TTransportException {
super(conf, host, port, timeout);
_client = new Nimbus.Client(_protocol);
}
public Nimbus.Client getClient() { return _client; }
}
客户端要持有NimbusClient, 要知道服务端Nimbus的主机和thrfit端口号,就可以通过RPC调用Nimbus的方法.
2) Submit Jar
StormSubmitter的本质是个Thrift Client, 而Nimbus则是Thrift Server, 所以所有的操作都是通过Thrift RPC来完成,
先判断topologyNameExists, 通过Thrift client得到现在运行的topology的状况, 并check;然后Submit Jar, 通过底下三步
client.getClient().beginFileUpload();
client.getClient().uploadChunk(uploadLocation, ByteBuffer.wrap(toSubmit));
client.getClient().finishFileUpload(uploadLocation);
把数据通过RPC发过去, 具体怎么存是nimbus自己的逻辑的事...
storm-core/backtype/storm/StormSubmitter.java
private static String submittedJar = null;
private static void submitJar(Map conf, ProgressListener listener) {
if(submittedJar==null) {
LOG.info("Jar not uploaded to master yet. Submitting jar...");
String localJar = System.getProperty("storm.jar"); // 即storm jar xxx.jar的jar包
submittedJar = submitJar(conf, localJar, listener);
}
}
public static String submitJar(Map conf, String localJar) {
NimbusClient client = NimbusClient.getConfiguredClient(conf);
try {
String uploadLocation = client.getClient().beginFileUpload();
LOG.info("Uploading topology jar " + localJar + " to assigned location: " + uploadLocation);
BufferFileInputStream is = new BufferFileInputStream(localJar);
while(true) {
byte[] toSubmit = is.read();
if(toSubmit.length==0) break;
client.getClient().uploadChunk(uploadLocation, ByteBuffer.wrap(toSubmit));
}
client.getClient().finishFileUpload(uploadLocation);
LOG.info("Successfully uploaded topology jar to assigned location: " + uploadLocation);
return uploadLocation;
} finally {client.close();}
}
3) Submit Topology 只是简单的调用RPC
storm-core/backtype/storm/StormSubmitter.java
private static Nimbus.Iface localNimbus = null; // 作为RPC的服务端
/**
* Submits a topology to run on the cluster. A topology runs forever or until explicitly killed. 提交一个计算拓扑以便运行在集群上
* @param name the name of the storm. 计算拓扑的名字,即命令行参数的第一个参数
* @param stormConf the topology-specific configuration. See {@link Config}. 计算拓扑指定的配置,Config继承HashMap
* @param topology the processing to execute.
* @param options to manipulate the starting of the topology */
public static void submitTopology(String name, Map stormConf, StormTopology topology, SubmitOptions opts) {
if(!Utils.isValidConf(stormConf)) {
throw new IllegalArgumentException("Storm conf is not valid. Must be json-serializable");
}
stormConf = new HashMap(stormConf); // Topology的Config设置
stormConf.putAll(Utils.readCommandLineOpts()); // 命令行参数-Dstorm.options=,见bin/storm的get_config_opts()
Map conf = Utils.readStormConfig(); // storm.yaml
conf.putAll(stormConf); // stormConf的配置会覆盖storm.yaml的配置
String serConf = JSONValue.toJSONString(stormConf); // Map转为json,序列化,要发送到Server
if(localNimbus!=null) { // 本地模式(运行所有的nimbus, supervisor...)
LOG.info("Submitting topology " + name + " in local mode");
localNimbus.submitTopology(name, null, serConf, topology);
} else {
NimbusClient client = NimbusClient.getConfiguredClient(conf);
if(topologyNameExists(conf, name)) {
throw new RuntimeException("Topology with name `" + name + "` already exists on cluster");
}
submitJar(conf);
try {
LOG.info("Submitting topology " + name + " in distributed mode with conf " + serConf);
if(opts!=null) {
client.getClient().submitTopologyWithOpts(name, submittedJar, serConf, topology, opts);
} else { // this is for backwards compatibility
client.getClient().submitTopology(name, submittedJar, serConf, topology);
}
} finally {client.close();}
}
LOG.info("Finished submitting topology: " + name);
}
- LocalCluster
本地模式, 所有进程都在同一台机器. LocalCluster并不是用java写的, 而是LocalCluster.clj. Clojure代码编译后会生成LocalCluster.class
storm-starter/storm/starter/WordCountTopology.java
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
先来看看ILocalCluster接口. LocalCluster实现了ILocalCluster接口
storm-core/backtype/storm/ILocalCluster.java
public interface ILocalCluster {
void submitTopology(String topologyName, Map conf, StormTopology topology);
void submitTopologyWithOpts(String topologyName, Map conf, StormTopology topology, SubmitOptions submitOpts);
void killTopology(String topologyName) throws NotAliveException;
void killTopologyWithOpts(String name, KillOptions options) throws NotAliveException;
void activate(String topologyName) throws NotAliveException;
void deactivate(String topologyName) throws NotAliveException;
void rebalance(String name, RebalanceOptions options) throws NotAliveException;
void shutdown();
String getTopologyConf(String id);
StormTopology getTopology(String id);
ClusterSummary getClusterInfo();
TopologyInfo getTopologyInfo(String id);
Map getState();
}
这里面包括了操作集群的动作: 提交计算拓扑, 杀死计算拓扑, 激活, 禁用, 负载均衡, 关闭集群. 集群的信息包括: 配置, 拓扑对象, 概要, 拓扑信息, 状态.
1) init
storm-core/backtype/storm/LocalCluster.clj (注: .clj文件在storm-core/src/clj下)
(ns backtype.storm.LocalCluster
(:use [backtype.storm testing config]) ;; 使用backtype/storm/testing.clj 和 config.clj
(:import [java.util Map])
(:gen-class
:init init
:implements [backtype.storm.ILocalCluster] ;; 接口为ILocalCluster
:constructors {[] [] [java.util.Map] []}
:state state )) ;; init()的返回值会赋值给state.
:gen-class提前编译产生class. http://clojuredocs.org/clojure_core/clojure.core/gen-class
:init name If supplied, names a function that will be called with the arguments to the constructor. Must return [ [superclass-constructor-args] state]
init方法的参数会作为构造函数的参数, 并且init方法也会被构造函数调用. 顾名思义init初始化操作. 返回值是一个向量,第一个为父类的参数类型
:constructors {[param-types] [super-param-types], ...} By default, constructors are created for the generated class which match the signature(s) of the constructors for the superclass. This parameter may be used to explicitly specify constructors, each entry providing a mapping from a constructor signature to a superclass constructor signature. When you supply this, you must supply an :init specifier. 参数类型的每个条目(多种情况下)分别是{[子类] [父类]}.
LocalCluster implements ILocalCluster, 有接口, 但是没有父类. 所以super-param-types=[]
前面init的参数会作为构造函数的参数. 而init的参数有两种情况[]或者[Map]. 所以{[] [] [Map] []} 第一个和第三个为LocalCluster, 第二四为父类都为[].
:state name If supplied, a public final instance field with the given name will be created. You must supply an :init function in order to provide a value for the state. 即使用:state, 必须提供init方法, init要给:state赋值. 这样后面才可以使用state变量(final)
init()方法如果接受一个Map参数, 就直接返回. 如果没有参数, 则调用mk-local-storm-cluster返回一个Map. 如果是java, 则要写两个同名方法, 参数不同.
(defn -init
([] ;; 第一种情况参数是[]. 类似init()
(let [ret (mk-local-storm-cluster :daemon-conf {TOPOLOGY-ENABLE-MESSAGE-TIMEOUTS true})]
[[] ret] ;; 返回值必须是 [ [superclass-constructor-args] state] 父类的参数类型都是[]. 所以开头的:state state会被赋值为ret!
))
([^Map stateMap] ;; 第二种情况参数是[Map], 可以理解是参数个数不同的构造函数重载, 类似init(Map)
[[] stateMap]))
init方法调用testing.clj的defnk mk-local-storm-cluster. 其中defnk 和普通defn的不同是,
可以在参数里面使用k,v, 并且可以在函数体中直接使用k来得到value
其实它的实现就是增加一个hashmap来存放这些k,v. [ Storm分析中令人费解的Clojure语法(http://www.xuebuyuan.com/418278.html) ]
所以(mk-local-storm-cluster key value)中的key=:daemon-conf, value={TOPOLOGY-ENABLE-MESSAGE-TIMEOUTS true}
value又是一个map. 因为daemon-conf表示进程配置, 可能有多个配置项. 具体到testing.clj中的方法实现, 又给出了多个默认的配置:
storm-core/backtype/storm/testing.clj
;; returns map containing cluster info 返回包含集群信息的map
;; local dir is always overridden in maps
;; can customize the supervisors (except for ports) by passing in map for :supervisors parameter
;; if need to customize amt of ports more, can use add-supervisor calls afterwards
;; LocalCluster.clj的init会设置:daemon-conf的值. 而如果没有给值, 则:daemon-conf默认为空的Map: {}. 这里还给出了其他默认值
;; 比如本机模式有2个supervisor(当然只有一个nimbus). 每个supervisor有3个端口,其中最小的端口号为1024, 其他分别为1025,1026
(defnk mk-local-storm-cluster [:supervisors 2 :ports-per-supervisor 3 :daemon-conf {} :inimbus nil :supervisor-slot-port-min 1024]
(let [zk-tmp (local-temp-path)
[zk-port zk-handle] (zk/mk-inprocess-zookeeper zk-tmp) ;; let中的模式匹配, zk/mk返回返回值第一个赋值给port,第二个给handle
daemon-conf (merge (read-storm-config) ;; merge map1 map2 ...
{TOPOLOGY-SKIP-MISSING-KRYO-REGISTRATIONS true
ZMQ-LINGER-MILLIS 0
TOPOLOGY-ENABLE-MESSAGE-TIMEOUTS false
TOPOLOGY-TRIDENT-BATCH-EMIT-INTERVAL-MILLIS 50
}
daemon-conf ;; defnk 函数参数的key value中的:daemon-conf {}. daemon-conf就表示包含这个kv的Map
{STORM-CLUSTER-MODE "local"
STORM-ZOOKEEPER-PORT zk-port
STORM-ZOOKEEPER-SERVERS ["localhost"]})
nimbus-tmp (local-temp-path)
port-counter (mk-counter supervisor-slot-port-min) ;; 参数的:key 可以作为字符串形式的参数
nimbus (nimbus/service-handler ;; nimbus服务处理, 这是一个服务器的方法
(assoc daemon-conf STORM-LOCAL-DIR nimbus-tmp) ;; assoc map key value
(if inimbus inimbus (nimbus/standalone-nimbus))) ;; 如果inimbus=nil,则创建一个标准的nimbus,最后一句为返回值
context (mk-shared-context daemon-conf)
cluster-map {:nimbus nimbus ;; 一个nimbus
:port-counter port-counter
:daemon-conf daemon-conf
:supervisors (atom []) ;; 初始化, 原子vector
:state (mk-distributed-cluster-state daemon-conf) ;; 这个值应该就是LocalCluster.clj中:state的值了
:storm-cluster-state (mk-storm-cluster-state daemon-conf)
:tmp-dirs (atom [nimbus-tmp zk-tmp])
:zookeeper zk-handle ;; zookeeper对象
:shared-context context}
supervisor-confs (if (sequential? supervisors) ;; list和vector都是sequential?. 而只有list是seq
supervisors
(repeat supervisors {}))] ;; 如果不是一个sequential, 则设置为Map类型
(doseq [sc supervisor-confs] ;; 赋值操作, 类似于let中的赋值
(add-supervisor cluster-map :ports ports-per-supervisor :conf sc)) ;; 根据supervisors配置创建supervisor,并更新到cluster-map中
cluster-map ;; 返回值, 为上面的{:key value ... }
))
add-supervisor仍然是一个defnk函数.
storm-core/backtype/storm/testing.clj
(defnk add-supervisor [cluster-map :ports 2 :conf {} :id nil]
(let [tmp-dir (local-temp-path)
port-ids (if (sequential? ports) ports (doall (repeatedly ports (:port-counter cluster-map))))
supervisor-conf (merge (:daemon-conf cluster-map)
conf
{STORM-LOCAL-DIR tmp-dir
SUPERVISOR-SLOTS-PORTS port-ids})
id-fn (if id (fn [] id) supervisor/generate-supervisor-id) ;; UUID的生成方式
daemon (with-var-roots [supervisor/generate-supervisor-id id-fn] (supervisor/mk-supervisor supervisor-conf (:shared-context cluster-map) (supervisor/standalone-supervisor)))]
(swap! (:supervisors cluster-map) conj daemon) ;; 更新(swap!)cluster-map中:supervisor的值
(swap! (:tmp-dirs cluster-map) conj tmp-dir)
daemon ;; dameon为body的返回值即mk-supervisor返回的supervisor对象
))
with-var-roots第一个参数为bindings, 把剩余的(&)都作为body. 在函数体内创建了supervisor. 和上面的nimbus/service-handler都是defserverfn服务器方法
body:(supervisor/mk-supervisor supervisor-conf (:shared-context cluster-map) (supervisor/standalone-supervisor)) 要求传递三个参数.最终返回supervisor
2) submitTopology
回到WordCountTopology的LocalCluster.submitTopology主线上
storm-core/backtype/storm/LocalCluster.clj
(defn -submitTopology [this name conf topology] ;; 第一个参数表示LocalCluster
;; (:nimbus map)获取map中:nimbus的值. 我们知道init的返回值是个Map, 其中包括了:nimbus这个key. 所以state就是ret!
(submit-local-topology (:nimbus (. this state)) ;; :nimbus是init返回值ret这个Map中key=:nimbus的Nimbus对象
name conf topology)) ;; 和StormSubmitter.submitTopology的参数一样: 计算拓扑的名字, 配置, StormTopology
之前以为开头部分的声明:state state是init()返回值Map中的:state的值(这个方法确实有这个key=:state).
而到这里(:nimbus (. this state)). this.state获取实例变量state如果是某个value, 就和(:key map)这个语法冲突了. 所以开头的:state是init()的返回值!
clojure中的HashMap的定义方式是map = {:key value}. 获取key的方式是: value = (:key map).
storm-core/backtype/storm/testing.clj
(defn submit-local-topology [nimbus storm-name conf topology]
(when-not (Utils/isValidConf conf) ;; 验证配置有效性, 和StormSubmmiter.submitTopology版本的差不多一样的流程
(throw (IllegalArgumentException. "Topology conf is not json-serializable")))
;; 同样调用nimbus的submitTopology, 这里因为是本机, 所以不需要上传. 配置也要转成json格式
(.submitTopology nimbus storm-name nil (to-json conf) topology))
看看clojure的to-json方法, 其实跟前面StormSubmitter.submitTopology的String serConf = JSONValue.toJSONString(stormConf); 是一样的.
storm-core/backtype/storm/util.clj
(defn to-json [obj]
(JSONValue/toJSONString obj)) ;; 调用JSONValue的静态方法toJSONString
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。