2014-10-16
在http://www.carfab.com/apachesoftware/hadoop/common/stable2/下找了个稳定的hadoop2.4.1下载了。
我使用的操作系统:linux mint 16, 64位。
配置java和ssh
设置JAVA_HOME等,并配置ssh,使得执行下面的命令不需要确认和输入密码:
$ ssh localhost
具体可以参考这篇文章 Hadoop1.2配置伪分布式
配置环境变量
将下载的hadoop-2.4.1移动到/home/letian/hadoop-2.4.1
中。在/home/letian/hadoop-env
建立目录2.4.1
,并在这个目录下建立目录namenode
和datanode
。
在/etc/profile
或者~/.zshrc
下添加下面的内容:
# set hadoop 2.4.1
export HADOOP_PREFIX=/home/hadoop-2.4.1
export HADOOP_HOME=$HADOOP_PREFIX
export PATH=$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$PATH
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export YARN_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_PREFIX/lib"
然后:
$ source /etc/profile
$ source ~/.bashrc
在/etc/sysctl.conf
中加入下面内容:
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
之后,重启网络:
$ sudo service networking restart
如果重启失败,那么重启一下电脑吧。
修改hadoop配置文件
hadoop-2.4.1/etc/hadoop/hdfs-site.xml :
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/letian/hadoop-env/2.4.1/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/letian/hadoop-env/2.4.1/hdfs/datanode</value>
</property>
</configuration>
hadoop-2.4.1/etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/letian/hadoop-env/tmp</value>
</property>
</configuration>
hadoop-2.4.1/etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8050</value>
</property>
</configuration>
hadoop-2.4.1/etc/hadoop/mapred-site.xml.template : 首先:
$ cp hadoop-2.4.1/etc/hadoop/mapred-site.xml.template hadoop-2.4.1/etc/hadoop/mapred-site.xml
然后编辑hadoop-2.4.1/etc/hadoop/mapred-site.xml
:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
格式化hdfs
下面的命令已经废弃,不建议使用。
$ ./bin/hadoop namenode -format
建议使用:
$ ./bin/hdfs namenode -format
启动
然后:
$ start-dfs.sh
$ start-yarn.sh
查看运行的任务:
$ jps
5022 Jps
2883 DataNode
3359 NodeManager
3022 SecondaryNameNode
3258 ResourceManager
2785 NameNode
查看NameNode的状态,在浏览器中访问http://localhost:50070/
。
查看SecondaryNameNode的状态,在浏览器中访问http://localhost:50090/
。
查看正在运行的mapreduce任务,在浏览器中访问http://localhost:8088
。
可以通过下面的命令查看启动状态:
zsh >> hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
14/10/16 15:12:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 127824457728 (119.05 GB)
Present Capacity: 96126644224 (89.52 GB)
DFS Remaining: 96126595072 (89.52 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: 127.0.0.1:50010 (localhost)
Hostname: myhost
Decommission Status : Normal
Configured Capacity: 127824457728 (119.05 GB)
DFS Used: 49152 (48 KB)
Non DFS Used: 31697813504 (29.52 GB)
DFS Remaining: 96126595072 (89.52 GB)
DFS Used%: 0.00%
DFS Remaining%: 75.20%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Thu Oct 16 15:12:59 CST 2014
测试wordcount
假设目录input
下文件test1.txt
内容如下:
hello world
文件test2.txt
内容如下:
hi,world
hello
将该目录复制到hdfs下的/data
目录下:
$ hadoop dfs -put input/ /data
进入目录hadoop-2.4.1
,运行:
$ hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar org.apache.hadoop.examples.WordCount /data /output
删除hdfs下的/data
目录:
$ hadoop dfs -rmr /data
查看结果:
$ hadoop fs -cat /output/part-r-00000
hello 2
hi,world 1
world 1
删除/data
和/output
目录:
hadoop dfs -rmr /data
hadoop dfs -rmr /output
遇到的问题
1、找不到JAVA_HOME
$ start-dfs.sh
14/10/16 13:53:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
14/10/16 13:53:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
说是找不到java_home,没办法,明明已经设置了。 在hadoop-2.4.0/etc/hadoop/hadoop-env.sh
中设置即可。
2、Unable to load native-hadoop library for your platform
zsh >> start-dfs.sh
14/10/16 13:57:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/letian/hadoop-2.4.0/logs/hadoop-sunlt-namenode-myhost.out
localhost: starting datanode, logging to /home/letian/hadoop-2.4.0/logs/hadoop-sunlt-datanode-myhost.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/letian/hadoop-2.4.0/logs/hadoop-sunlt-secondarynamenode-myhost.out
14/10/16 13:58:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
出现了一个WARN
,原因可见:http://stackoverflow.com/questions/19943766/hadoop-unable-to-load-native-hadoop-library-for-your-platform-error-on-centos。
3、SSH 拒绝连接
我这边是重启电脑后就没问题了。
4、无法访问http://localhost:50070/
$ stop-all.sh
$ rm -r /tmp/hadoop-*
$ hadoop namenode -format
$ start-all.sh
5、jps的结果没有Datanode 清空namenode、datanode对应的目录,然后重启hadoop试一试。
参考
Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
Steps To Setup Hadoop 2.4.0 (Single Node Cluster) on CentOS/RHEL
Installing Hadoop 2.4 on Ubuntu 14.04
adoop 2.4 Installing on Ubuntu 14.04 (Single-Node Cluster) - 2014