2014-6-14
Apache Hive是建立在Hadoop基础上的数据仓库,为开发人员/使用人员提供了类似SQL的命令行接口。Hive本质上是就在HDFS和MapReduce之上进行了抽象,也就是说Hive的SQL语句会被转换成MapReduce任务,其处理的数据一般放在HDFS中。
如何在单机上为Hadoop配置伪分布式,请参考 Hadoop1.2配置伪分布式。
Hive官网在这里。 在hive下载apache-hive-0.13.1-bin.tar.gz,解压后,更名为hive-0.13.1,放在~/
下。
设置HIVE_HOME
在/etc/profile
中添加以下内容:
export HIVE_HOME=/home/letian/hive-0.13.1
在HDFS上建立相关目录
启动hadoop1.2:
$ start-all.sh
在HDFS中建立目录/tmp,并增加组用户的写权限:
$ hadoop fs -mkdir /tmp
$ hadoop fs -chmod g+w /tmp
在HDFS中建立目录/user/hive/warehouse,并增加组用户的写权限:
$ hadoop fs -mkdir /user/hive/warehouse
$ hadoop fs -chmod g+w /user/hive/warehouse
/user/hive/warehouse
是怎么来的?在hive-0.13.1/conf/目录下有一默认配置文件的模板hive-default.xml.template
,能找到下面的内容:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
而/tmp
相关的配置为:
<property>
<name>hive.querylog.location</name>
<value>/tmp/${user.name}</value>
<description>
Location of Hive run time structured log file
</description>
</property>
试用Hive
方便起见,先把hive-0.13.1/bin
加入$PATH
变量中。
输入命令hive,进入命令行接口:
$ hive
好了,现在我们就能使用熟悉的SQL了,当然,不可能完全相似。
创建一个user表:
hive> CREATE TABLE user (name STRING, age INT, email STRING);
OK
Time taken: 0.495 seconds
查看有哪些表:
hive> SHOW TABLES;
OK
user
Time taken: 0.025 seconds, Fetched: 1 row(s)
使用dfs命令查看HDFS的内容:
hive> dfs -ls /user/hive/warehouse/;
Found 1 items
drwxr-xr-x - letian supergroup 0 2014-06-14 10:16 /user/hive/warehouse/user
查看user表的结构:
hive> DESCRIBE user;
OK
name string
age int
email string
Time taken: 0.374 seconds, Fetched: 3 row(s)
插入数据:
Hive不支持行级别的插入。
建立文件user.dat,内容如下:
letian 22 letian@123.com
导入数据:
hive> LOAD DATA LOCAL INPATH '/home/letian/user.dat' OVERWRITE INTO TABLE user;
Copying data from file:/home/letian/user.dat
Copying file: file:/home/letian/user.dat
Loading data to table default.user
Deleted hdfs://localhost:9000/user/hive/warehouse/user
Table default.user stats: [numFiles=1, numRows=0, totalSize=25, rawDataSize=0]
OK
Time taken: 0.674 seconds
查看user表的内容:
hive> SELECT name FROM user;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201406140958_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201406140958_0001
Kill Command = /home/letian/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201406140958_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-06-14 10:40:49,060 Stage-1 map = 0%, reduce = 0%
2014-06-14 10:40:51,091 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.06 sec
2014-06-14 10:40:53,107 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.06 sec
MapReduce Total cumulative CPU time: 1 seconds 60 msec
Ended Job = job_201406140958_0001
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.06 sec HDFS Read: 234 HDFS Write: 25 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 60 msec
OK
letian 22 letian@123.com
Time taken: 11.38 seconds, Fetched: 1 row(s)
退出Hive:
hive> exit;