1
头图

1.数据类型

参考文档
https://cwiki.apache.org/conf...

1.1 基本数据类型

Hive数据类型长度例子
TINYINT1byte有符号整数20
SMALINT2byte有符号整数20
INT4byte有符号整数20
BIGINT8byte有符号整数20
BOOLEAN布尔类型,true或者falseTRUE FALSE
FLOAT单精度浮点数3.14159
DOUBLE双精度浮点数3.14159
STRING字符系列。可以指定字符集。可以使用单引号或者双引号。‘now is the time’ “for all good men”
TIMESTAMP时间类型
BINARY字节数组

Hive的String类型相当于数据库的varchar类型,该类型是一个可变的字符串,不过它不能声明其中最多能存储多少个字符,理论上它可以存储2GB的字符数。

1.2 复杂数据类型(集合数据类型)

数据类型描述语法示例
STRUCT和c语言中的struct类似,都可以通过“点”符号访问元素内容。例如,如果某个列的数据类型是STRUCT{first STRING, last STRING},那么第1个元素可以通过字段.first来引用。struct() 例如struct<street:string, city:string>
MAPMAP是一组键-值对元组集合,使用数组表示法可以访问数据。例如,如果某个列的数据类型是MAP,其中键->值对是’first’->’John’和’last’->’Doe’,那么可以通过字段名[‘last’]获取最后一个元素map() 例如map<string, int>
ARRAY数组是一组具有相同类型和名称的变量的集合。这些变量称为数组的元素,每个数组元素都有一个编号,编号从零开始。例如,数组值为[‘John’, ‘Doe’],那么第2个元素可以通过数组名[1]进行引用。Array() 例如array<string>

ARRAY和MAP与Java中的Array和Map类似,而STRUCT与C语言中的Struct类似,它封装了一个命名字段集合,复杂数据类型允许任意层次的嵌套。

1.3 类型转换

  • 隐式类型转换规则如下

1)任何整数类型都可以隐式地转换为一个范围更广的类型,如TINYINT可以转换成INT,INT可以转换成BIGINT。
2)所有整数类型、FLOAT和STRING类型都可以隐式地转换成DOUBLE。
3)TINYINT、SMALLINT、INT都可以转换为FLOAT。
4)BOOLEAN类型不可以转换为任何其它的类型。

示例

0: jdbc:hive2://hadoop10:10000> select '365'+2
. . . . . . . . . . . . . . . > ;
INFO  : Compiling command(queryId=v2admin_20210109163104_2d9d7430-013d-45f2-9b55-e46e1105366b): select '365'+2
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109163104_2d9d7430-013d-45f2-9b55-e46e1105366b); Time taken: 2.064 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109163104_2d9d7430-013d-45f2-9b55-e46e1105366b): select '365'+2
INFO  : Completed executing command(queryId=v2admin_20210109163104_2d9d7430-013d-45f2-9b55-e46e1105366b); Time taken: 0.002 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+--------+
|  _c0   |
+--------+
| 367.0  |
+--------+
  • 使用CAST操作显示进行数据类型转换

例如CAST('365' AS INT)将把字符串'365' 转换成整数365;如果强制类型转换失败,如执行CAST('X' AS INT),表达式返回空值 NULL。
示例

0: jdbc:hive2://hadoop10:10000> select cast('365' as int)+2, cast('365a' as int)+2;
INFO  : Compiling command(queryId=v2admin_20210109163353_5fedabc9-f213-4ca2-89e4-0a4e27598fab): select cast('365' as int)+2, cast('365a' as int)+2
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:int, comment:null), FieldSchema(name:_c1, type:int, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109163353_5fedabc9-f213-4ca2-89e4-0a4e27598fab); Time taken: 0.225 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109163353_5fedabc9-f213-4ca2-89e4-0a4e27598fab): select cast('365' as int)+2, cast('365a' as int)+2
INFO  : Completed executing command(queryId=v2admin_20210109163353_5fedabc9-f213-4ca2-89e4-0a4e27598fab); Time taken: 0.001 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+------+-------+
| _c0  |  _c1  |
+------+-------+
| 367  | NULL  |
+------+-------+

2. 数据定义

2.1 创建数据库

1)创建数据库mydb

0: jdbc:hive2://hadoop10:10000> create database if not exits mydb;

2)创建一个数据库,指定数据库在HDFS存放的位置

create database demo1_db location '/db/demo1_db';

2.2 查询数据库 show database;

0: jdbc:hive2://hadoop10:10000> show databases;
INFO  : Compiling command(queryId=v2admin_20210109164417_b436551a-5605-4367-9600-904e693b37ec): show databases
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109164417_b436551a-5605-4367-9600-904e693b37ec); Time taken: 0.014 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109164417_b436551a-5605-4367-9600-904e693b37ec): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=v2admin_20210109164417_b436551a-5605-4367-9600-904e693b37ec); Time taken: 0.012 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------------+
| database_name  |
+----------------+
| default        |
| demo_db2       |
| mydb           |
+----------------+
3 rows selected (0.056 seconds)

2.3 查看数据库详情 desc database my_db;

0: jdbc:hive2://hadoop10:10000> desc database db;
Error: Error while compiling statement: FAILED: SemanticException [Error 10072]: Database does not exist: db (state=42000,code=10072)
0: jdbc:hive2://hadoop10:10000> desc database mydb;
INFO  : Compiling command(queryId=v2admin_20210109164555_ab020ad8-6e97-4a62-b7cd-0a71045f9cd9): desc database mydb
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:db_name, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer), FieldSchema(name:location, type:string, comment:from deserializer), FieldSchema(name:owner_name, type:string, comment:from deserializer), FieldSchema(name:owner_type, type:string, comment:from deserializer), FieldSchema(name:parameters, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109164555_ab020ad8-6e97-4a62-b7cd-0a71045f9cd9); Time taken: 0.038 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109164555_ab020ad8-6e97-4a62-b7cd-0a71045f9cd9): desc database mydb
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=v2admin_20210109164555_ab020ad8-6e97-4a62-b7cd-0a71045f9cd9); Time taken: 0.005 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------+--------------+---------------------------------------------------+-------------+-------------+-------------+
| db_name  |   comment    |                     location                      | owner_name  | owner_type  | parameters  |
+----------+--------------+---------------------------------------------------+-------------+-------------+-------------+
| mydb     | My first db  | hdfs://hadoop10:9820/user/hive/warehouse/mydb.db  | v2admin     | USER        |             |
+----------+--------------+---------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.069 seconds)

2.4 切换当前数据库

use mydb;

哈哈,看到这,应该发现了,这东东跟我们用的sql基本没多少区别,这下能够感受到使用hive的一些好处了吧。

2.5 删除数据库

嗯,这个操作,我建议就不看了,啥时候用啥时候去往上查,避免误删除数据库。

2.6 创建表

这个跟mysql有些区别,语法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
[(col_name data_type [COMMENT col_comment], ...)] 
[COMMENT table_comment] 
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] 
[CLUSTERED BY (col_name, col_name, ...) 
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
[ROW FORMAT row_format] 
[STORED AS file_format] 
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]

1)CREATE TABLE 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常,可以用 IF NOT EXISTS 选项来忽略这个异常。
2)EXTERNAL关键字可以让用户创建一个外部表,在建表的同时可以指定一个指向实际数据的路径(LOCATION),在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。
3)COMMENT:为表和列添加注释。
4)PARTITIONED BY创建分区表
5)CLUSTERED BY创建分桶表
6)SORTED BY 对桶中的一个或多个列另外排序
7)ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
用户在建表的时候可以自定义SerDe或者使用自带的SerDe。如果没有指定ROW FORMAT 或者ROW FORMAT DELIMITED,将会使用自带的SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的SerDe,Hive通过SerDe确定表的具体的列的数据。
SerDe是Serialize/Deserilize的简称, hive使用Serde进行行对象的序列与反序列化。
8)STORED AS指定存储文件类型
常用的存储文件类型:SEQUENCEFILE(二进制序列文件)、TEXTFILE(文本)、RCFILE(列式存储格式文件)
如果文件数据是纯文本,可以使用STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCEFILE。
9)LOCATION :指定表在HDFS上的存储位置。
10)AS:后跟查询语句,根据查询结果创建表。
11)LIKE允许用户复制现有的表结构,但是不复制数据。

2.6.1 创建管理表

create table if not exists stu1(
id int,
name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/hive/warehouse/demo01_db.db';

show tables 查看下

0: jdbc:hive2://hadoop10:10000> show tables;
INFO  : Compiling command(queryId=v2admin_20210109170622_440b4273-c978-4fbb-8ca4-e727cc3ecbb1): show tables
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109170622_440b4273-c978-4fbb-8ca4-e727cc3ecbb1); Time taken: 0.031 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109170622_440b4273-c978-4fbb-8ca4-e727cc3ecbb1): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=v2admin_20210109170622_440b4273-c978-4fbb-8ca4-e727cc3ecbb1); Time taken: 0.025 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+-----------+
| tab_name  |
+-----------+
| stu1      |
+-----------+

看下表的详情

0: jdbc:hive2://hadoop10:10000> desc stu1;
INFO  : Compiling command(queryId=v2admin_20210109170725_25f84e5c-cb26-4df7-a29f-a64c5087f76e): desc stu1
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109170725_25f84e5c-cb26-4df7-a29f-a64c5087f76e); Time taken: 0.071 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109170725_25f84e5c-cb26-4df7-a29f-a64c5087f76e): desc stu1
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=v2admin_20210109170725_25f84e5c-cb26-4df7-a29f-a64c5087f76e); Time taken: 0.017 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| id        | int        |          |
| name      | string     |          |
+-----------+------------+----------+

2.6.2 外部表

所谓外部表,就是Hive不能完全掌控的表,删除外部表,并不会删掉其数据,但会删除这张表对应的元数据信息。

那什么时候使用外部表,什么时候使用管理表呢?
比如网站的日志,对于原始数据,我们用外部表,做数据分析之类,而中间表、结果表就是用管理表,也就是内部表。

示例:
1)准备数据
vim stu.txt

1001    lisi
1002    zhangsan
1003    wangwu
1004    zhalou

2)上传至hdfs

[v2admin@hadoop10 demo]$ hadoop fs -put stu.txt /demofile
2021-01-09 17:26:08,616 INFO  [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
2021-01-09 17:26:09,162 INFO  [Thread-7] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

3)创建外部表

create external table if not exists stu2(
id int,
name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/demofile';

4)查看表内容

0: jdbc:hive2://hadoop10:10000> select * from stu2;
INFO  : Compiling command(queryId=v2admin_20210109172702_4db60baa-51ee-442d-b486-b5b22d981d83): select * from stu2
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:stu2.id, type:int, comment:null), FieldSchema(name:stu2.name, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109172702_4db60baa-51ee-442d-b486-b5b22d981d83); Time taken: 0.193 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109172702_4db60baa-51ee-442d-b486-b5b22d981d83): select * from stu2
INFO  : Completed executing command(queryId=v2admin_20210109172702_4db60baa-51ee-442d-b486-b5b22d981d83); Time taken: 0.0 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------+------------+
| stu2.id  | stu2.name  |
+----------+------------+
| 1001     | lisi       |
| 1002     | zhangsan   |
| 1003     | wangwu     |
| 1004     | zhalou     |
+----------+------------+

5)删除外部表

0: jdbc:hive2://hadoop10:10000> drop table stu2;
INFO  : Compiling command(queryId=v2admin_20210109172804_97f8538f-f5d6-4f94-a169-f0d170f57da5): drop table stu2
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109172804_97f8538f-f5d6-4f94-a169-f0d170f57da5); Time taken: 0.037 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109172804_97f8538f-f5d6-4f94-a169-f0d170f57da5): drop table stu2
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=v2admin_20210109172804_97f8538f-f5d6-4f94-a169-f0d170f57da5); Time taken: 0.245 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
No rows affected (0.327 seconds)

6)查看hdfs原始数据

[v2admin@hadoop10 demo]$ hadoop fs -cat /demofile/stu.txt
2021-01-09 17:29:11,911 INFO  [main] Configuration.deprecation (Configuration.java:logDeprecation(1395)) - No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
2021-01-09 17:29:12,445 INFO  [main] sasl.SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
1001    lisi
1002    zhangsan
1003    wangwu
1004    zhalou

数据还在,但对应的meta数据则被删除。

2.8 修改表

2.8.1 修改表名

ALTER TABLE table_name RENAME TO new_table_name

2.8.2 增加、修改、替换列

更新列语法

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]

增加和替换列语法

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...) 

示例
查询表结构

0: jdbc:hive2://hadoop10:10000> desc stu1;
INFO  : Compiling command(queryId=v2admin_20210109173448_890d225c-c89e-4f41-aae3-a75dcd6845c5): desc stu1
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109173448_890d225c-c89e-4f41-aae3-a75dcd6845c5); Time taken: 0.036 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109173448_890d225c-c89e-4f41-aae3-a75dcd6845c5): desc stu1
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=v2admin_20210109173448_890d225c-c89e-4f41-aae3-a75dcd6845c5); Time taken: 0.016 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| id        | int        |          |
| name      | string     |          |
+-----------+------------+----------+

添加列

0: jdbc:hive2://hadoop10:10000> alter table stu1 add columns(age int);

查询表结构

0: jdbc:hive2://hadoop10:10000> desc stu1;
INFO  : Compiling command(queryId=v2admin_20210109173722_c490c030-015b-447f-a38b-a14197c70ef4): desc stu1
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string, comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=v2admin_20210109173722_c490c030-015b-447f-a38b-a14197c70ef4); Time taken: 0.04 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=v2admin_20210109173722_c490c030-015b-447f-a38b-a14197c70ef4): desc stu1
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=v2admin_20210109173722_c490c030-015b-447f-a38b-a14197c70ef4); Time taken: 0.014 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| id        | int        |          |
| name      | string     |          |
| age       | int        |          |
+-----------+------------+----------+

更新列

0: jdbc:hive2://hadoop10:10000> alter table stu1 change column age desc string;

查询表结构

0: jdbc:hive2://hadoop10:10000> desc stu1;
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| id        | int        |          |
| name      | string     |          |
| desc      | string     |          |
+-----------+------------+----------+

3. 数据操作

Hive的数据操作就两个内容,一个导入,一个导出。

3.1 导入数据

3.1.1 直接向表中载入数据--load

  • 1.语法
hive> load data [local] inpath '数据的path' [overwrite] into table table_name [partition (partcol1=val1,…)];
属性名描述
load data加载数据
local这个表示是从本地载入到hive表,没有的话,表示从hdfs载入到hive
inpath载入数据的路径
overwrite覆盖表中已有数据,没有这个关键字表示追加
into table加载到哪张表
table_name具体表的名字
partition上传到指定的分区
  • 2.示例

1)创建一张表stu1

 0:jdbc:hive2://hadoop10:10000> create table stu1(
. . . . . . . . . . . . . . . > id int,
. . . . . . . . . . . . . . . > name string)
. . . . . . . . . . . . . . . > row format delimited fields terminated by '\t';

2)准备数据

[v2admin@hadoop10 demo]$ cat stu.txt 
1001    lisi
1002    zhangsan
1003    wangwu
1004    zhalou

3)上传到hdfs一份

[v2admin@hadoop10 demo]$ hadoop fs -put stu.txt /student

4)从本地加载hive的stu1表中

0: jdbc:hive2://hadoop10:10000> load data local inpath '/home/v2admin/demo/stu.txt' into table demo01_db.stu1;

我们看下表里面有没有内容

0: jdbc:hive2://hadoop10:10000> select * from stu1;
+----------+------------+
| stu1.id  | stu1.name  |
+----------+------------+
| 1001     | lisi       |
| 1002     | zhangsan   |
| 1003     | wangwu     |
| 1004     | zhalou     |
+----------+------------+

5)从HDFS中加载文件到hive中

0: jdbc:hive2://hadoop10:10000> load data inpath '/student/stu.txt' into table demo01_db.stu1;

看下表的内容

+----------+------------+
| stu1.id  | stu1.name  |
+----------+------------+
| 1001     | lisi       |
| 1002     | zhangsan   |
| 1003     | wangwu     |
| 1004     | zhalou     |
| 1001     | lisi       |
| 1002     | zhangsan   |
| 1003     | wangwu     |
| 1004     | zhalou     |
+----------+------------+

3.1.2 插入数据Insert

insert into table stu1 values(1005,'aaa'),(1006,'bbb'),(1007,'ccc'); 

inert into 表示追加数据
如果使用insert overwrite 表示覆盖插入

3.1.3 通过Location指定加载数据路径,建表时,可以直接导入数据

0: jdbc:hive2://hadoop10:10000> create external table stu2(
. . . . . . . . . . . . . . . > id int,
. . . . . . . . . . . . . . . > name string)
. . . . . . . . . . . . . . . > row format delimited fields terminated by '\t'
. . . . . . . . . . . . . . . > location '/student';

我们看下表

0: jdbc:hive2://hadoop10:10000> select * from stu2;
+----------+------------+
| stu2.id  | stu2.name  |
+----------+------------+
| 1001     | lisi       |
| 1002     | zhangsan   |
| 1003     | wangwu     |
| 1004     | zhalou     |
+----------+------------+

就是一些建表的操作,可以灵活使用

3.1.3 import

既然说是导入,那肯定有一个import操作,这个是需要先export导出,然后在进行导入,见后面数据到处

3.2 数据导出

3.2.1 export 到处到hdfs中

0: jdbc:hive2://hadoop10:10000>  export table demo01_db.stu2 to '/user/hive/warehouse/export/stu2';

导入就是import

0: jdbc:hive2://hadoop10:10000>  import table stu3  from
 '/user/hive/warehouse/export/stu2';

两者搭配使用,主要用于两个Hadoop平台集群之间Hive表迁移。

3.2.2 Insert导出

Insert也能导出数据?还真可以,我印象最开始看到这个也很蒙,记下来就行。
示例把查询的结果导出到本地

0: jdbc:hive2://hadoop10:10000> insert overwrite local directory '/home/v2admin/demo/tmp'
. . . . . . . . . . . . . . . > select * from stu1;

不加local就是导出到hdfs上,我们看下导出的文件

1001^Alisi
1002^Azhangsan
1003^Awangwu
1004^Azhalou
1001^Alisi
1002^Azhangsan
1003^Awangwu
1004^Azhalou
1005^Aaaa
1006^Abbb
1007^Accc

跟我们想的不一样,这个可以格式化后导出,示例如下

insert overwrite local directory '/home/v2admin/demo/tmp'
           ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'             select * from stu1;

再看下文件

1001    lisi
1002    zhangsan
1003    wangwu
1004    zhalou
1001    lisi
1002    zhangsan
1003    wangwu
1004    zhalou
1005    aaa
1006    bbb
1007    ccc

是我们想要的格式了。

3.2.3 通过Hive SHell命令导出

[v2admin@hadoop10 tmp]$ hive -e 'select * from demo01_db.stu2;' > stu2.txt; 

导入导出方式多种,灵活使用即可。

4 查询

基本上和我们sql等同,区别不大,比如
1)全表查询

select * from stu1;

2)查询指定列

select id,name from stu1;

3)列别名

select id as stu_num, name as stu_name from stu1;

4) where 语句

selec * from sut1 where id=1001;

5)limit语句

select * from stu1 limit 3;

分组查询group by的使用 having的使用,多表查询join都和sql差异不大。
所以hive很容易上手,极大程度上减少了我们的学习成本。

参考文档
https://cwiki.apache.org/conf...


leafgood
183 声望15 粉丝

踏入IT这个圈子,就成了一名玩家,想要走的更远,就要不断给自己技能加点才行。