HBase统计方法
使用Hive统计
-
建一张Hive表映射HBase表
CREATE EXTERNAL TABLE LJKTEST( ID STRING , AGE STRING , NAME STRING , COMPANY STRING , SCHOOL STRING )STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:AGE,0:NAME,0:COMPANY,0:SCHOOL") TBLPROPERTIES("hbase.table.name" = "LJKTEST");
- 执行hive统计sql即可
这边使用COUNT(1)和COUNT(*)都不起作用,统计出来是0.只能使用COUNT(字段)。应该跟映射机制有关系。
SELECT COUNT(ID) FROM LJKTEST;
使用phoenix统计
SELECT COUNT(1) FROM LJKTEST;
使用HBase原生接口统计
HADOOP_CLASSPATH=`hbase classpath` hadoop jar /usr/hdp/current/hbase-client/lib/hbase-server.jar rowcounter LJKTEST
使用spark统计
spark统计hbase
pom文件引入依赖,<font color="red">必须加上exclusion,否则会报错class "javax.servlet.FilterRegistration"'s signer information does not match signer information of other classes in the same package</font>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.1.2.2.5.0.0-1245</version>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<artifactId>jetty</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-util</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api-2.5</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.2.2.5.0.0-1245</version>
</dependency>
写spark统计HBase代码
@Test
def sparkCountHBase(): Unit = {
val sc = new SparkContext("local","hbase-test")
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "LJKTEST")
val hbaseRDD = sc.newAPIHadoopRDD(conf,classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
val count = hbaseRDD.count()
println(s"总共有 $count 条数据!")
}
Spark统计Phoenix
pom引入依赖
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-spark</artifactId>
<version>4.7.0-HBase-1.1</version>
<exclusions>
<exclusion>
<artifactId>servlet-api</artifactId>
<groupId>javax.servlet</groupId>
</exclusion>
<exclusion>
<artifactId>jetty</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-util</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>servlet-api-2.5</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
</exclusions>
</dependency>
<!--https://mvnrepository.com/artifact/org.apache.phoenix/phoenix-spark-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.2.2.5.0.0-1245</version>
</dependency>
spark统计Phoenix代码
@Test
def sparkCountPhoenix(): Unit = {
val sc = new SparkContext("local","phoenix-test")
val sqlContext = new SQLContext(sc)
val df = sqlContext.load(
"org.apache.phoenix.spark",
Map("table" -> "LJKTEST", "zkUrl" -> "dn1:2181")
)
// df.show()
println( s"总共有 ${df.count} 条数据!")
}
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。